Method of adjusting service function chains to improve network performance

ABSTRACT

Some embodiments of the invention provide a method for monitoring and adjusting a service chain that includes several services to perform on data messages passing through a network. For a service chain implemented by a set of service paths each of which includes several service nodes that implement the services of the service chain, the method receives, from a set of service proxies, operational data relating to data transmission characteristics of a set of operational service nodes. The method analyzes the data transmission characteristics. In response to the analysis of the data transmission characteristics, the method alters the set of service paths implementing the service chain.

RELATED APPLICATIONS

Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign applicationSerial No. 202041002334 filed in India entitled “A METHOD OF ADJUSTINGSERVICE FUNCTION CHAINS TO IMPROVE NETWORK PERFORMANCE”, on Jan. 20,2020, by VMware, Inc., which is herein incorporated in its entirety byreference for all purposes.

The present application (Attorney Docket No. F672.02) is related insubject matter to U.S. patent application Ser. No. 16/843,913 (AttorneyDocket No. F672.01), which is incorporated herein by reference.

BACKGROUND

Datacenters today use service chains that designate a series of serviceoperations to perform on data messages of a network. These servicechains are implemented by service paths which each define a specific setof service nodes for a set of data to pass through. Each service nodeimplements one of the service operations designated in the servicechain. However, the tools presently used to monitor networks do notinclude interfaces to monitor the implementation of service chains.Therefore, there is a need in the art for a user interface totroubleshoot the service paths and service nodes implementing a servicechain. The present art also does not use service proxies to collect datatransmission characteristics of individual service nodes. The lack ofservice proxies collecting the data leads to inadequate monitoring ofservice nodes in a service chain. Therefore, there is a need in the artfor a method that uses service proxies that monitor the individualservice nodes to collect operational statistics and modifies the servicepaths in response to the collected operational statistics.

BRIEF SUMMARY

Some embodiments of the invention provide a method for monitoring aparticular service chain that includes several service operations toperform on data messages passing through a network. The method providesa user interface to receive a selection of the particular service chainfrom a plurality of service chains. The particular service chain isimplemented by several service paths, each of which includes severalservice nodes. Each service node performs one of the services of theservice chain. The method generates a graphical display of the servicepaths through the service nodes, and as part of this graphical display,provides operational data relating to operations of the service nodes ofthe service paths. In some embodiments, the method also provides, aspart of the graphical display, operational data relating to operationsof the nodes of an individual service path. In some embodiments, themethod further includes defining the plurality of service paths for theparticular service chain.

“Operational data” refers herein to data gathered at least partly whileservice nodes are in operation. The operational data may includeoperational statistical data derived from one or more measurements(e.g., performance measurements) of the operations of a service node. Insome embodiments, operational data may also include data that is notderived from a measurements of the service nodes, such as an identifierof the service node, a packet type or packet types handled by theservice node, etc.

The graphical display of some embodiments includes a node representationof each service node and a link connecting the node representations ofeach successive pair of service nodes in at least one service path. Insome embodiments, the graphical display includes a concurrentrepresentation of the plurality of service paths. A single service nodemay provide a particular service for multiple service paths of theservice chain in some embodiments. The method of some embodimentsupdates the graphical display to add a node representation of any newlyimplemented service node of the displayed service chain or to removenode representations of any service node(s) that is/are no longerimplemented.

The method of some embodiments responds to various received usercommands. In some embodiments operational data for a service node isprovided after receiving a selection of the service node (e.g., via acursor click or cursor hover operation). The graphical display of someembodiments provides operational data relating to operations of a set ofservice nodes on an individual service path after receiving a selectionof the individual service path. Similarly, the graphical display of someembodiments receives a selection of a particular service. Once a serviceis selected, the graphical display then provides operational datarelating to the operations of a set of service nodes that each implementinstantiations of the particular service.

The method of some embodiments allows a user to set threshold values forsome particular performance parameters. For example, the method mayallow a user to set a threshold for throughput of the service chain,aggregate latency of all service nodes of a particular service type,etc. Similarly, the method of some embodiments provides an interfaceoption for setting a threshold for a performance parameter for aparticular service type. In some embodiments, the method provides avisual cue to indicate whether a performance parameter has gone beyond athreshold. For example, the method may change the color of one or moregraphical elements to indicate the performance parameter has gone beyondthe threshold.

In addition to monitoring the service chains of a network, the method ofsome embodiments allows a user to alter the operations of those servicechains. For example, the method of some embodiments provides interfaceoptions for instantiating an additional service node for a particularservice and/or shutting down or restarting an already operating servicenode.

In the method of some embodiments, data may be received from serviceproxies that monitor service nodes. A service proxy may be implementedon the same device as a service node it monitors, or on a differentdevice from the service node. In some embodiments, each service node hasits own service proxy. In other embodiments, a single service proxy maymonitor multiple nodes.

In some embodiments, a service node (sometimes called a “serviceinstance runtime”) is implemented on a device of the network and theoperational data includes operational statistics received from a serviceproxy implemented on the same device. The device may be a serviceappliance or a host computer of the network. A service appliance is adevice designed to provide a particular service or a small set ofservices rather than operating as a general purpose computing device.Service devices of some embodiments implement their own service proxiesin addition to (or as part of) the service they are designed for. A hostcomputer is a physical computer that implements (hosts) virtual machines(VMs) or containers that implement service nodes. VMs and containers aresometimes collectively referred to as “service machines.” Each hostcomputer of some embodiments implements service proxies for the servicenodes implemented on that host computer.

In other embodiments, a service proxy is implemented on a separatedevice from a service node it monitors. For example, service nodes of aservice appliance that does not implement a service proxy may beimplemented on host computers or other devices of the network.Similarly, the methods of some embodiments may implement a service nodeon one host machine and a service proxy to monitor that node on anotherhost machine.

Implementing service proxies on or off the same device that implementsthe service nodes they are monitoring are not mutually exclusiveoptions. For example, some embodiments may implement (1) service proxieson a host computer to monitor the service nodes on that host computerand (2) additional service proxies on the host computer to monitorservice nodes that are implemented on one or more appliances that lackservice proxies of their own.

In some embodiments, at least part of the operational data is generatedfrom operational statistics. The operational statistics are based ondata received from a service proxy that collects data transmissioncharacteristics relating to a particular service node. These operationalstatistics may be calculated at one or more service proxies, calculatedat a server of the network, or calculated partially at a server andpartially at one or more service proxies. In some cases, thesecalculations may include aggregation of data transmissioncharacteristics (e.g., averages, sums, weighted averages, etc.). Thedata transmission characteristics of some embodiments include a latencyand/or a throughput of a service node. The latency is the time for aquery to be sent to the service node and a reply received (e.g.,received at the service proxy). A throughput value represents the amountof data in the data messages that passes through a service node in aparticular amount of time. The operational data of some embodimentsincludes data received from the service proxies relating to packetsreceived by a particular service node (e.g., packet type, error rate,drop rate, etc.) and/or data relating to payload size.

The method of some embodiments is implemented on a telecommunicationsnetwork and the particular service chain includes a set of serviceoperations applied to a particular type of data message flow. Forexample, the method may be implemented on a 5G network, 4G network, orLTE network.

Some embodiments of the invention provide a method for monitoring andadjusting a service chain that includes multiple services to perform ondata messages passing through a network. The method operates on aservice chain implemented by multiple service paths each of whichincludes multiple service nodes that implement multiple services of theservice chain. The method receives operational data relating to a set ofservice nodes among the multiple service nodes. The method analyzes theoperational data. In response to the analysis of the operational data,the method alters the service paths implementing the service chain.

The methods of some embodiments alter the service paths in multipleways. In some embodiments, the method alters the service paths byinstantiating a new service node to implement a particular service ofthe service chain. Instantiating the new service node in someembodiments is in response to operational data indicating that athroughput of one or more service nodes implementing the particularservice is beyond a threshold (e.g., lower than a set minimum thresholdor above a set maximum threshold) throughput. Instantiating the newservice node in some embodiments is in response to operational dataindicating that a latency of one or more service nodes implementing theparticular service is beyond a threshold latency.

The method of some embodiments may alter service paths by deactivatingan existing service node that is implementing a particular service.Deactivating the existing service node in some embodiments may be inresponse to operational data indicating that a throughput of one or moreservice nodes implementing the particular service is beyond a thresholdthroughput. In some embodiments, deactivating the existing service nodemay be in response to operational data indicating that a latency of oneor more service nodes implementing the particular service is beyond athreshold latency.

Some embodiments alter service paths by restarting an existing servicenode that is implementing a particular service. For example, the methodof such embodiments may restart a service node in response tooperational data indicating errors in the existing service node or inresponse to operational data indicating that the existing service nodehas fallen below a performance threshold.

In some embodiments a particular service node implements parts ofmultiple service paths of the service chain. In some such embodiments,the method alters the service paths by reducing a number of servicepaths implemented in part by a particular service node. For example, themethod may reduce the number of service paths using a particular nodewhen the node is receiving more than a threshold amount of data in thedata messages (e.g., being overloaded). The method of some embodiments,alters the service paths by increasing a number of service pathsimplemented in part by a particular service node. For example, themethod may increase the number of service paths using a particular nodein response to the particular service node receiving less than athreshold amount of data in the data messages (e.g., beingunderutilized).

The methods of various embodiments receive and analyze data from varioussources and in various ways. In some embodiments, a particular servicenode is implemented on a device and the operational data for theparticular service node is based at least partly on data received from aservice proxy implemented on the device. The device in some embodimentsmay be a service appliance or a host computer of the network. In someembodiments, a particular service node is implemented on a first deviceand the operational data for the particular service node is based atleast partly on data received from a service proxy implemented on asecond device.

The method of some embodiments bases the operational data on operationalstatistics received from service proxies. Analyzing the operational datain some embodiments includes aggregating operational statistics datawith previously collected operational statistics data. The aggregationmay include calculating a weighted sum of the operational statisticsdata. The operational statistics of some embodiments includes datatransmission characteristics of the service nodes received from theservice proxies. The data transmission characteristics of a particularservice node may include a throughput of the particular service nodeand/or a latency time for the particular service node.

In some embodiments, the service proxy collects other data. The serviceproxy collects data relating to packets received by a particular servicenode in some embodiments. The service proxy collects data relating tothe sizes of payloads received by a particular service node in someembodiments.

Various embodiments include various other elements or limitations. Forexample, the network of some embodiments is a telecom network and aparticular service chain is a set of service functions each applied to aparticular type of data message flow. In some embodiments, the methodfurther includes defining the plurality of service paths for aparticular service chain. The service nodes of some embodiments includeat least one of: a service implemented on a virtual machine (VM), aservice implemented on a container, or a service implemented by anappliance.

The preceding Summary is intended to serve as a brief introduction tosome embodiments of the invention. It is not meant to be an introductionor overview of all inventive subject matter disclosed in this document.The Detailed Description that follows and the Drawings that are referredto in the Detailed Description will further describe the embodimentsdescribed in the Summary as well as other embodiments. Accordingly, tounderstand all the embodiments described by this document, a full reviewof the Summary, Detailed Description, the Drawings and the Claims isneeded. Moreover, the claimed subject matters are not to be limited bythe illustrative details in the Summary, Detailed Description and theDrawing.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appendedclaims. However, for purposes of explanation, several embodiments of theinvention are set forth in the following figures.

FIG. 1 illustrates an example of segregated guest and service planesthat are implemented in some embodiments by two logical forwardingelements.

FIG. 2 illustrates a data message between two guest virtual machines(GVMs) being redirected along a service path to be processed by servicevirtual machines (SVMs) of some embodiments.

FIG. 3 conceptually illustrates a relationship between a service chainand a set of one or more service paths that implement the service chainin some embodiments.

FIG. 4 illustrates an example of a service chain and its associatedservice paths.

FIG. 5 illustrates examples of reverse service paths for the forwardservice paths illustrated in FIG. 4 .

FIG. 6 illustrates an example of input/output (IO) chain components thatimplement a service plane in some embodiments.

FIG. 7 illustrates a process performed by a service index pre-processorand a service transport layer caller of some embodiments.

FIG. 8 illustrates a data flow example corresponding to the processdescribed in FIG. 7 .

FIG. 9 illustrates an operation of a port proxy of some embodiments forformatting a data message for forwarding by a first service node.

FIG. 10 conceptually illustrates a process of some embodiments forpassing a data message in a service path to a next hop.

FIG. 11 illustrates a process that the service proxy of FIG. 6 performsin some embodiments each time it receives a data message traversingalong an ingress path of a service node.

FIG. 12 illustrates an example of a data message in some embodimentsbeing forwarded from a first hop service node to a second hop servicenode.

FIG. 13 conceptually illustrates a process that a service proxy performsin some embodiments each time it receives a data message traversingalong an egress path of its service node.

FIG. 14 conceptually illustrates a process started by an encap processoron a next hop computer that receives an encapsulated data message thatneeds to be processed by an SVM executing on its computer.

FIG. 15 illustrates an example of a data message in some embodimentsbeing forwarded from a second hop service node to a third hop servicenode.

FIG. 16 illustrates an example of a data message in some embodimentsbeing forwarded from a third hop service node back to a first hopservice node.

FIG. 17 conceptually illustrates a process that a service indexpost-processor performs in some embodiments.

FIG. 18 conceptually illustrates several operations that networkmanagers and controllers perform in some embodiments to define rules forservice insertion, next service hop forwarding, and service processing.

FIG. 19 illustrates how service paths are dynamically modified in someembodiments.

FIG. 20 illustrates an example of a graphical user interface (GUI) ofsome embodiments that displays elements implementing a service chain.

FIG. 21 illustrates a GUI displaying operational data for a service nodein response to a user selection of the node.

FIG. 22A illustrates a GUI displaying operational data for the nodesalong a service path in response to a user selection of the servicepath.

FIG. 22B illustrates a GUI displaying a visual indicator of a poorlyfunctioning node.

FIG. 23 illustrates the example GUI displaying operational data for thenodes that provide a particular service in response to a user selectionof one of the nodes implementing the service.

FIG. 24 illustrates a GUI adding an additional service node for theselected service in response to a user command.

FIG. 25 illustrates a process of some embodiments for selecting aservice chain then receiving and displaying operational data forcomponents that implement that service chain.

FIG. 26 illustrates an example of a service proxy monitoring a servicenode.

FIG. 27 conceptually illustrates a process of some embodiments forcalculating and reporting statistics of a service node at a serviceproxy.

FIG. 28 conceptually illustrates a process of some embodiments forreceiving, storing, and analyzing, at a central controller, statisticsfrom a service proxy.

FIG. 29 illustrates a process for automatically restarting/remediatingan operational service node when the particular service node fails tomeet a threshold.

FIG. 30 illustrates a process for automatically adding a new servicenode when the performance of the service nodes implementing a particularservice fail to meet a threshold.

FIG. 31 illustrates a process for automatically shifting data traffic tounderused nodes of a service or recovering resources from an underusednode of a service.

FIG. 32 conceptually illustrates an electronic system with which someembodiments of the invention are implemented.

DETAILED DESCRIPTION

In the following detailed description of the invention, numerousdetails, examples, and embodiments of the invention are set forth anddescribed. However, it will be clear and apparent to one skilled in theart that the invention is not limited to the embodiments set forth andthat the invention may be practiced without some of the specific detailsand examples discussed.

Some embodiments of the invention provide a method for monitoring aparticular service chain that includes several service operations toperform on data messages passing through a network. The method providesa user interface to receive a selection of the particular service chainfrom a plurality of service chains. The particular service chain isimplemented by several service paths, each of which includes severalservice nodes. Each service node performs one of the services of theservice chain. The method generates a graphical display of the servicepaths through the service nodes, and as part of this graphical display,provides operational data relating to operations of the service nodes ofthe service paths. In some embodiments, the method also provides, aspart of the graphical display, operational data relating to operationsof the nodes of an individual service path. In some embodiments, themethod further includes defining the plurality of service paths for theparticular service chain.

In some embodiments, a service node performs a middlebox serviceoperation, such as a firewall, an intrusion detection system, anintrusion prevention system, a load balancer, an encryptor, a messagemonitor, a message collector, or any number of other middlebox services.In some embodiments, multiple instances of a category of services may beimplemented as separate operations. For example, a service chain mayinclude two separate firewalls from two different vendors or multipledifferent virus/malware scanners.

“Operational data” refers herein to data gathered at least partly whileservice nodes are in operation. The operational data may includeoperational statistical data derived from one or more measurements(e.g., performance measurements) of the operations of a service node. Insome embodiments, operational data may also include data that is notderived from a measurements of the service nodes, such as an identifierof the service node, a packet type or packet types handled by theservice node, etc.

Although the discussion below describes receiving the operational datafrom service proxies in some embodiments, one of ordinary skill in theart will understand that other embodiments receive operational datarelating to each node from other system elements. The operational datain some embodiments may include raw data and/or calculated data.Measured values and values calculated from those measured values may bereferred to herein as operational statistics.

In some embodiments, the user interface and automatic monitoring of thepresent invention operates in a network that uses service proxies tomonitor service nodes that sequentially implement a set of service pathsof a service chain. The service chains, service paths, service nodes,network, and host machines, monitored by the service proxies, and whichthe user interface and automatic monitoring of some embodiments areapplied to, are described in the following figures, FIGS. 1-19 .

FIG. 1 illustrates an example of segregated guest and service planesthat are implemented in some embodiments by two logical forwardingelements (LFEs) 130 and 132. As shown, two guest machines 102 and 104and three service machines 106, 108, and 110 execute on three hostcomputers 112, 114, and 116 along with three software forwardingelements 120, 122, and 124. In this example, the guest machines andservice machines are guest virtual machines (GVMs) and service virtualmachines (SVMs), but in other embodiments these machines can be othertypes of machines, such as containers.

Also, in this example, each logical forwarding element is a distributedforwarding element that is implemented by configuring multiple softwareforwarding elements (SFEs) on multiple host computers. To do this, eachSFE or a module associated with the SFE in some embodiments isconfigured to encapsulate the data messages of the LFE with an overlaynetwork header that contains a virtual network identifier (VNI)associated with the overlay network. As such, the LFEs are said to beoverlay network constructs that span multiple host computers in thediscussion below.

The LFEs also span in some embodiments configured hardware forwardingelements (e.g., top of rack switches). In some embodiments, each LFE isa logical switch that is implemented by configuring multiple softwareswitches (called virtual switches or vswitches) or related modules onmultiple host computers. In other embodiments, the LFEs can be othertypes of forwarding elements (e.g., logical routers), or any combinationof forwarding elements (e.g., logical switches and/or logical routers)that form logical networks or portions thereof. Many examples of LFEs,logical switches, logical routers, and logical networks exist today,including those provided by VMware's NSX network and servicevirtualization platform.

As shown, the LFE 130 defines the guest forwarding plane that connectsthe GVMs 102 and 104 in order to forward data messages between theseGVMs. In some embodiments, this LFE is a logical switch that connects toa logical router, which connects the GVMs directly or through a logicalgateway to networks outside of the logical switch's logical network. TheLFE 130 is implemented in some embodiments by configuring softwareswitches 120 and 122 and/or their related modules (e.g., relatedport/VNIC filter modules) on the host computers 112 and 114 to implementa first distributed logical switch.

FIG. 1 and other figures discussed below show the source and destinationGVMs being on the same logical network and being connected to the sameLFE. One of ordinary skill will realize that the service operations ofsome embodiments do not require the source and destination machines tobe connected to the same LFE, or to even be in the same network or thesame datacenter. These service operations are performed on data messagesthat exit the source machine's network or enter a source machine'snetwork. The figures depict the source and destination machines asconnected to the same LFE to emphasize that the service plane 132 isimplemented by a separate logical network than the logical network thatforwards the data messages associated with the guest machines.

The LFE 132 defines the service forwarding plane that connects the SVMs106, 108, and 110 in order to forward data messages associated with theGVMs through service paths that include the SVMs. In some embodiments,the LFE 132 is also a logical switch that is implemented by configuringsoftware switches 120, 122, and 124 and/or their related modules on thehost computers 112, 114, and 116 to implement a second distributedlogical switch. Instead of configuring the same set of SFEs to implementboth the guest and service forwarding planes (i.e., the guest andservice LFEs), other embodiments configure one set of SFEs on a set ofhost computers to implement the guest forwarding plane and another setof SFEs on the set of host computers to implement the service forwardingplane. For instance, in some embodiments, each host computer executes aguest software switch and a service software switch, and these twoswitches and/or their related modules can be configured to implement aguest logical switch and a service logical switch.

In some embodiments, the software switches 120, 122, and 124 and/ortheir related modules can be configured to implement multiple guestforwarding planes (e.g., guest LFEs) and multiple service forwardingplanes (e.g., service LFEs) for multiple groups of machines. Forinstance, for a multi-tenant datacenter, some such embodiments define aguest LFE and a service LFE for each tenant for which at least one chainof services needs to be implemented. For each group of related machines(e.g., for each tenant's machines), some embodiments define two virtualnetwork identifiers (VNIs) to configure a shared set of softwareforwarding elements (e.g., software switches) to implement the twodifferent forwarding planes, i.e., the guest forwarding plane and theservice forwarding plane. These two VNIs are referred to below as theguest VNI (GVNI) and the service VNI (SVNI). In FIG. 1 , the guest LFEports 150 and 152 are associated with the GVNI, while the service LFEports 154, 156, and 158 are associated with the SVNI, as shown.

In some embodiments, the service plane 132 is also implemented byinserting modules in input/output (IO) chains of a GVM's egress andingress datapaths to and from an SFE 120 or 122. In this implementation,the service plane 132 can identify a data message sent from the GVM orreceived for the GVM, forward the data message to a set of SVMs toperform a chain of services on the data message, and then to return thedata message back to the GVM's datapath so that the data message can beproceed along its datapath to the software switch or to the GVM (i.e.,so that the data message can be processed based on the destinationnetwork addresses specified by the source GVM). Such a GVM is referredto below as the source GVM as the data message being processed by theservice nodes is a data message identified on the GVM's egress oringress path. In some embodiments, a GVM's egress/ingress IO chain isimplemented as a set of hooks (function calls) in the GVM's VNIC(virtual network interface card) 180 or the SFE port associated with theGVM's VNIC (e.g., the SFE port communicating with the GVM's VNIC).

Before providing an example of the IO chain components of someembodiments that implement the service plane, FIG. 2 illustrates anexample of a data message 202 from the GVM 102 to GVM 104 beingredirected along the service plane 132 so that the data message can beprocessed by SVMs 108 and 110 that perform a chain of two serviceoperations. As shown, the service LFE 132 first forwards the datamessage to SVM 108, and then forwards the data message to SVM 110,before returning the data message back to the egress path of GVM 102 sothat the data message can be processed based on the destination networkaddresses specified by the source GVM 102.

The service LFE in some embodiments forwards the data message betweenhosts 112, 114, and 116 by using an overlay encapsulation header thatstores the SVNI for the service LFE. Also, when the service LFE is aservice logical switch, the service forwarding plane in some embodimentsuses the MAC addresses associated with the SVMs (e.g., MAC addresses ofSVM VNICs) to forward the data message between ports of the servicelogical switch. In some embodiments, the MAC forwarding also usesservice plane MAC address associated with the source GVM, even thoughthis GVM does not directly connect to the service plane but insteadconnects to the service plane through a port proxy, as further describedbelow.

Once the data message 202 returns to the egress path of the GVM 102, theguest LFE 130 forwards the data message to its destination (e.g., asspecified by the destination network address in the data message'sheader), which is GVM 104. The guest LFE 130 in some embodimentsforwards the data message between hosts 112 and 114 by using an overlayencapsulation header that stores the GVNI for the guest LFE. Also, whenthe guest LFE is a logical switch, the guest forwarding plane in someembodiments uses the guest plane MAC addresses associated with the GVMs102 and 104 to forward the data message (e.g., by using the guest planeMAC address of GVM 104 to forward the data message to the guestforwarding port 152 associated with this GVM). While the service planeof FIG. 2 captures a data message passing through a GVM's egress path,the service plane in some embodiments can also capture a data message asit is passing through a GVM's ingress path before it reaches the GVM'sVNIC.

In some embodiments, a chain of service operations is referred to as aservice chain. A service chain in some embodiments can be implementedwith one or more sets of service nodes (e.g., service machines orappliances), with each set of service nodes defining a service path.Hence, in some embodiments, a service chain can be implemented by eachof one or more service paths. Each service path in some embodimentsincludes one or more service nodes for performing the set of one or moreservices of the service chain and a particular order through thesenodes.

FIG. 3 presents an object diagram that illustrates the relationshipbetween a service chain 302 and a set of one or more service paths 304that implement the service chain. Each service chain has a service chain(SC) identifier 306, while each service path has a service pathidentifier (SPI) 308. Each service path is associated with a set of mservice nodes, which, as shown, are identified in terms of serviceinstance endpoints 310. Service instance endpoints in some embodimentsare logical locations in the network where traffic can go or come from aservice node connected to the service plane. In some embodiments, aservice instance endpoint is one LFE port (e.g., an SFE port) associatedwith a service node (e.g., a VNIC of an SVM). In these or otherembodiments, a service instance endpoint can be associated with two LFEports used for a service node as further described below for embodimentsthat use GRE encapsulation. Also, the service endpoints in someembodiments are addressable through MAC addresses associated with theLFE ports or with the SVM VNICs associated with (e.g., communicatingwith these LFE ports).

In some embodiments, each service chain 302 is defined by references toone or more service profiles 312, with each service profile associatedwith a service operation in the chain. As described below, a servicenode in some embodiments (1) receives, from a service manager, a mappingof a service chain identifier to a service profile that it has toimplement, and (2) receives, with a data message, a service chainidentifier that it maps to the service profile to determine the serviceoperation that it has to perform. In some embodiments, the receivedmapping is not only based on the service chain identifier (SCI) but isalso based on a service index value (that specifies the location of theservice node in a service path) and a direction through a service chain(that specifies an order for performing the sequence of servicesspecified by the service chain). The service profile in some embodimentsdescribes the service operation that the service node has to perform. Insome embodiments, a service profile can identify a set of rules for aservice node to examine.

Also, in some embodiments, service insertion rules 314 are defined byreference to service chain identifier 306 for service insertion modulesassociated with GVMs. Such service insertion modules use these serviceinsertion rules 314 to identify service chains to use to process datamessages associated with a source GVM. As mentioned above, the datamessages are referred to below as being from a source GVM as the datamessages that are processed by the service chains are identified on theegress paths from or ingress paths to the GVMs.

As further described below, the service insertion (SI) rules associateflow identifiers with service chain identifiers. In other words, someembodiments try to match a data message's flow attributes to the flowidentifiers (referred to below as rule identifiers of the SI rules) ofthe service insertion rules, in order to identify a matching serviceinsertion rule (i.e., a rule with a set of flow identifiers that matchesthe data message's flow attributes) and to assign this matching rule'sspecified service chain as the service chain of the data message. Aspecific flow identifier (e.g., one defined by reference to a five-tupleidentifier) could identify one specific data message flow, while a moregeneral flow identifier (e.g., one defined by reference to less than thefive tuples) can identify a set of several different data message flowsthat match the more general flow identifier. As such, a matching datamessage flow is any set of data messages that have a common set ofattributes that matches a rule identifier of a service insertion rule.

As further described below, other embodiments use contextual attributesassociated with a data message flow to associate the data message with aservice insertion rule. Numerous techniques for capturing and usingcontextual attributes for performing forwarding and service operationsare described in U.S. patent application Ser. No. 15/650,251, which areincorporated herein. Any of these techniques can be used in conjunctionwith the embodiments described herein.

Next hop forwarding rules 316 in some embodiments are defined byreference to the SPI values 308 and service instance endpoints 310.Specifically, in some embodiments, a service path is selected for aservice chain that has been identified for a data message. At each hop,these embodiments use the forwarding rules 316 to identify the nextservice instance endpoint based on the SPI value for this service pathalong with a current service index (SI) value, which identifies thelocation of the hop in the service path. In other words, each forwardingrule in some embodiments has a set of matching criteria defined in termsof the SPI/SI values, and specifies a network address of the next hopservice instance endpoint that is associated with these SPI/SI values.To optimize the next hop lookup for the first hop, some embodimentsprovide to the source GVM's service insertion module the next hopnetwork address with the SPI, as part of a service path selectionprocess.

FIG. 4 illustrates an example of a service chain and its associatedservice path. As shown, each service chain 405 in some embodiments isdefined as a sequential list of service profiles 410, with each profilein this example related to a different middlebox service (such asfirewall, load balancer, intrusion detector, data message monitor,etc.). Also, in this example, each of the M profiles can be implementedby one SVM in a cluster m of VMs. As shown, different clusters fordifferent profiles can have different numbers of SVMs. Also, in someembodiments, one service profile is implemented by one service node(i.e., a cluster of several service nodes is not required to implement aservice profile).

Since multiple SVMs in a cluster can provide a particular service, someembodiments define for a given service chain, multiple service pathsthrough multiple different combinations of SVMs, with one SVM of eachcluster being used in each combination. In the example of FIG. 4 , thereare N service paths associated with the service chain 405, traversed bydata messages originating at a GVM 402 on their way to a GVM 404. Eachservice path is identified by a different set of dashed lines in thisfigure.

Specifically, the first service path passes through first SVM 1,1 of thefirst service profile's cluster to implement the first service of theforward service chain 405, the first SVM 2,1 of the second serviceprofile's cluster to implement the second service of the forward servicechain 405, and third SVM M,3 of the Mth service profile's cluster toimplement the Mth service of the forward service chain 405. The secondservice path passes through second SVM 1,2 of the first serviceprofile's cluster to implement the first service of the forward servicechain 405, the first SVM 2,1 of the second service profile's cluster toimplement the second service of the forward service chain 405, and firstSVM M,1 of the Mth service profile's cluster to implement the Mthservice of the forward service chain 405.

The third service path passes through third SVM 1,3 of the first serviceprofile's cluster to implement the first service of the forward servicechain 405, the second SVM 2,2 of the second service profile's cluster toimplement the second service of the forward service chain 405, andsecond SVM M,2 of the Mth service profile's cluster to implement the Mthservice of the forward service chain 405. The Nth service path passesthrough third SVM 1,3 of the first service profile's cluster toimplement the first service of the forward service chain 405, the secondSVM 2,2 of the second service profile's cluster to implement the secondservice of the forward service chain 405, and fourth SVM M,4 of the Mthservice profile's cluster to implement the Mth service of the forwardservice chain 405. As the example illustrates, different service pathsmay use the same SVM for a given service operation. However, regardlessof the service path that a given data message traverses, the same set ofservice operations is performed in the same sequence, for paths that areassociated with the same service chain and the same service direction.

In some embodiments, a service chain has to be performed in a forwarddirection for data messages from a first GVM to a second GVM, and thenin the reverse direction for data messages from the second GVM to thefirst GVM. In some such embodiments, the service plane selects both theservice path for the forward direction and the service path for thereverse direction when it processes the first data message in the flowfrom the first GVM to the second GVM. Also, in some of theseembodiments, the forward and reverse service paths are implemented bythe same sets of service nodes but in the reverse order.

FIG. 5 illustrates examples of reverse service paths for the forwardservice paths illustrated in FIG. 4 . While the forward service pathsare for performing M services on data messages from GVM 402 to GVM 404,the reverse service paths are for performing M services on data messagesfrom GVM 404 to GVM 402. Also, the order of these services is reversedwith the service paths in FIG. 5 performing service profiles M to 1,while the service paths in FIG. 4 perform service profile 1 to M.

Also, in the examples of FIGS. 4 and 5 , each reverse service path hasone corresponding forward service path that is implemented by the sameexact set of SVMs but in the reverse order, as indicated by the servicepath legends and the similar dashed lines in these figures. For example,the forward, second service path passes through SVM 1,2 for the firstservice associated with the first profile, SVM 2,1 for the secondservice associated with the second profile, and SVM M,1 for the Mthservice associated with the Mth service profile, while the associatedreverse, second service path passes through SVM M,1 for the firstservice associated with the Mth service profile, SVM 2,1 for the secondservice associated with the second profile, and SVM 1,2 for the secondservice associated with the first profile.

In some embodiments, the same service nodes are used for the forward andreverse paths because at least one of the service nodes (e.g., afirewall SVM) that implements one of the service profiles needs to seethe data traffic in both directions between two data endpoints (e.g.,two GVMS). In other embodiments, the same service nodes do not need tobe used for both directions of data message flows between two dataendpoints so long as the same set of service operations are performed inopposite orders.

FIG. 6 illustrates an example of the IO chain components that implementthe service plane in some embodiments. As shown, the service plane 132is implemented by software switches 120, 122, and 124 executing on thehost computers and two sets of modules 610, 612, 614, 620, 624, 626, and628 on these computers. The implemented service plane in this example aswell some of the other examples illustrated in some of the subsequentfigures is an overlay logical L2 service plane. One of ordinary skillwill realize that other embodiments are implemented by other types ofservice planes, such as overlay L3 service planes, or overlay networkswith multiple L2 logical switches and one or more logical L3 routers.

In FIG. 6 , the software switches 120, 122, and 124 and modules 610,612, 614, 620, 624, 626, and 628 implement two different layers of theservice plane, which are the service insertion layer 602 and the servicetransport layer 604. The service insertion layer 602 (1) identifies theservice chain for a data message, (2) selects the service path to use toperform the service operations of the service chain, (3) identifies thenext-hop service nodes at each hop in the selected service path(including the identification of the source host computer to which thedata message should be returned upon the completion of the servicechain), and (4) for the service path, specifies the service metadata(SMD) header attributes for the data message. The SMD attributes in someembodiments include the network service header (NSH) attributes per RFC(Request for Comments) 8300 of IETF (Internet Engineering Task Force).

The service transport layer 604, on the other hand, formulates theservice overlay encapsulation header and encapsulates the data messagewith this header so that it can pass between service hops. In someembodiments, the service transport layer 604 modifies the SMD header toproduce the service overlay encapsulation header. For instance, in someof these embodiments, the overlay encapsulation header is a Geneveheader with the SMD attributes stored in a TLV (type, length, value)section of the Geneve header. In other embodiments, the servicetransport layer 604 adds the service overlay encapsulation header to anSMD header that is first used to encapsulate the data message. Also,when traversing between two hops (e.g., between two service nodes)executing on the same host computer, the service transport layer inseveral embodiments described below does not encapsulate the datamessage with an overlay encapsulation header in some embodiments. Inother embodiments, even when traversing between two hops on the samehost computer, the service transport layer encapsulates the data messagewith an overlay encapsulation header.

In some embodiments, the service insertion (SI) layer 602 includes an SIpre-processor 610 and an SI post-processor 612, in each the two IOchains 650 and 652 (i.e., the egress IO chain 650 and the ingress IOchain 652) of a GVM for which one or more service chains are defined.The SI layer 602 also includes a service proxy 614 for each service nodeconnected to the service plane (e.g., for each SVM with a VNIC pairedwith a service plane LFE port). The service transport (ST) layer 604includes one STL port proxy 620 on each host computer that has one ormore possible source GVMs for which one or more service chains aredefined. The ST layer 604 also has (1) an STL caller 624 in each IOchain of each source GVM, (2) an STL module 626 in the IO chain of eachSVM, and (3) one or more encap processors 628.

For a data message that passes through a GVM's ingress or egressdatapath, the SI pre-processor 610 on this datapath performs severaloperations. It identifies the service chain for the data message andselects the service path for the identified service chain. Thepre-processor also identifies the network address for a first hopservice node in the selected service path and specifies the SMDattributes for the data message. The SMD attributes include in someembodiments the service chain identifier (SCI), the SPI and SI values,and the direction (e.g., forward or reverse) for processing the serviceoperations of the service chain. In some embodiments, the SPI valueidentifies the service path while the SI value specifies the number ofservice nodes.

After the SI pre-processor completes its operation, the STL caller 624in the same datapath calls the STL port proxy 620 to relay the SMDattributes and first hop's network address that the pre-processoridentified, so that the port proxy can forward the SMD attributesthrough the service plane to the first hop. The port proxy formats thedata message for forwarding to the first service node. In someembodiments, this formatting comprises replacing the original source anddestination MAC addresses in the data message with a service plane MACaddress that is associated with the source GVM 102 and the MAC addressof the first hop service node. This formatting also stores a set ofattributes for the data message that should be processed by otherservice transport layer modules (e.g., the other STL modules, etc.) onthe same host computer. These data message attributes include the SMDattributes as well as the original source and destination MAC addresses.

The STL port proxy 620 passes the formatted data message along with itsstored attributes to the software switch 120. Based on the destinationMAC address (i.e., the first hop MAC address) of the formatted datamessage, the software switch delivers the data message to the switchport associated with the first hop SVM. When the first hop is on thesame host computer as the port proxy 620, the data message is providedto the STL module 626 in the ingress IO chain of the first hop's servicenode on the same host computer. When the first hop is not on the samehost computer, the data message is encapsulated with an encapsulatingheader and forwarded to the next hop, as further described below.

Each hop's STL module 626 re-formats the data message by replacing theservice plane source MAC address and service plane destination MACaddress (i.e., its service node's MAC address) with the original sourceand destination MAC addresses of the data message. It then passes thisre-formatted data message with its accompanying SMD attributes to itshop's service proxy 614. This service proxy is in the IO chain of theingress datapath of the GVM. For purposes of preventing the illustrationin FIG. 6 from being overcomplicated with unnecessary detail, theingress and egress paths of each SVM in this example are combined inthis figure, unlike the ingress and egress paths 650 and 652 of the GVM102.

The service proxy 614 encapsulates the received data message with anencapsulating NSH header that stores the data message's SMD attributesand provides this encapsulated data message to its service node when theservice node can support NSH headers. When the service node is an SVM,the service proxy in some embodiments supplies the data messages and itsNSH header to the SVM's VNIC through a VNIC injection process, asfurther described below. When the service node cannot process NSHheaders, the service proxy 614 stores the SMD attributes into a legacyQinQ encapsulating header or a GRE encapsulating header, and then passesthe encapsulated data message to the VNIC of the SVM. These headers willbe further described below.

In some embodiments, the service proxy 614 of each service hopsegregates the service node for that hop from the service transportlayer. This segregation improves the security of both the SVM and theservice transport layer. It also allows the service proxy to ensure thatthe data messages that are provided to its SVM are formatted properly,which is especially important for legacy SVMs that do not support thenewer NSH format.

The service proxy 614 in some embodiments also performs livenessdetection signaling with its service node to ensure that the servicenode is operational. In some embodiments, the service proxy sends a datamessage with a liveness value to its service node at least once in eachrecurring time period. To do this, the service proxy sets and resets atimer to ensure that it has sent a liveness signal for each time periodto its service node. Each liveness value is accompanied with a livenesssequence number to allow the service proxy to keep track of livenessresponses provided by the SVM. Each time the service node replies to aliveness signal, it provides to the service proxy the same livenessvalue in a responsive data message in some embodiments or itscorresponding value in the responsive data message in other embodiments.Also, with each liveness responsive data message, the service nodeprovides the same sequence number in some embodiments, or an incrementedversion of the sequence number provided by the service proxy in otherembodiments.

As further described below, the service proxy of some embodimentspiggybacks some of its liveness detection signaling on each data messagethat it passes to its service node from the service forwarding plane.Each time that the service proxy sends a liveness signal to its servicenode, it resets its liveness timer. Each time the service node processesthe data message, it provides the processed data message back to theservice node with the responsive liveness value and associated sequencenumber (incremented in some embodiments, or non-incremented in otherembodiments, as mentioned above).

In some embodiments, the service proxy registers a liveness detectionfailure when the service node does not respond to its liveness signalwithin a particular time (e.g., within 0.3 seconds). After registeringtwo successive liveness detection failures, the service proxy in someembodiments notifies a local control plane (LCP) module executing on itshost the SVM has failed so that the LCP can notify a central controlplane (CCP) server. In response to such a notification, the CCP removesthe SVM and the service paths on which SVM resides from the forwardingand path selection rules in the data plane, and if needed, generatesadditional service paths for the failed SVM's associated service chain.Also, in some embodiments, the service proxy sends an in-band datamessage back to the source GVM to program its classifier to not selectthe service path on which the failed service node resides.

In some embodiments, the service proxy also performs flow programming atthe behest of its service node. This flow programming in someembodiments involves modifying how the source GVM's IO chain selectsservice chains, service paths, and/or forwards data message flows alongservice paths. In other embodiments, this flow programming involvesother modifications to how a data message flow is processed by theservice plane. Flow programming will be further described below.

Upon receiving a data message and its SMD attributes (in anencapsulating NSH header or some other encapsulating header), the SVMperforms its service operation. In some embodiments, the SVM usesmapping records that it receives from its service manager to map theSCI, SI and direction values in the SMD attributes to a service profile,and then maps this service profile to one of its rule sets, which itthen examines to identify one or more service rules to process. In someembodiments, each service rule has a rule identifier that is defined interms of data message attributes (e.g., five tuple attributes, which arethe source and destination IP address, source and destination portaddresses and the protocol). The SVM in some embodiments compares therule's identifier with the attributes of the data message to identify amatching rule. Upon identifying one or more matching rules, the SVM insome embodiments performs an action specified by the highest prioritymatching rule. For instance, a firewall SVM might specify that the datamessage should be allowed to pass, should be dropped and/or should beredirected.

Once the SVM has completed its service operation, the SVM forwards thedata message along its egress datapath. The service proxy in the egressdatapath's IO chain then captures this data message and for this datamessage, identifies the network address of the next hop in the servicepath. To do this, the service proxy in some embodiments decrements theSI value, and then uses this decremented value along with the SPI valuein the data message's stored attribute set to identify an exact matchforwarding rule that identifies a next hop network address. In someembodiments, the SVM can decrement the SI value. For such cases, theservice proxy in some embodiments can be configured not to decrement theSI value when its corresponding SVM decremented it.

In either configuration, the service proxy identifies the next hopnetwork address by using the appropriate SPI/SI values to identify thenext-hop forwarding rule applicable to the data message. When theproxy's service node is on multiple service paths, the proxy'sforwarding rule storage stores multiple exact match forwarding rulesthat can specify different next hop network addresses for differentSPI/SI values associated with different service paths. Assuming that thedecremented SI value is not zero, the next hop in the service path isanother service node. Hence, the proxy in some embodiments provides thenext hop's MAC address to the proxy's associated STL module 626 in theSVM's egress datapath. This module then re-formats the data message, byspecifying the SVM's MAC address and the next hop's MAC address as thesource and destination MAC addresses and storing the original source anddestination MAC addresses of the data message in the stored set ofattributes stored for the data message. The STL module 626 then forwardsthe data message along the egress path, where it reaches the softwareswitch, which then has to forward the data message and its storedattributes to the next hop service node.

When the next hop is on the same host computer, the software switchpasses the data message and its attributes to the port that connects tothe STL module of the next hop's service node, as described above. Onthe other hand, when the next hop service node is on another hostcomputer, the software switch provides the data message to the uplinkport that connects to the VTEP (VXLAN Tunnel Endpoint) that communicatesthrough an overlay network tunnel with a VTEP on the other hostcomputer. An encap processor 628 then captures this data message alongthe egress path of this port, defines an encapsulating overlay headerfor this data message and encapsulates the data message with thisoverlay header. In some embodiments, the overlay header is a singleheader that stores both SMD and STL attributes. For instance, in someembodiments, the overlay header is a Geneve header that stores the SMDand STL attributes in one or more TLVs.

As mentioned above, the SMD attributes in some embodiments include theSCI value, the SPI value, the SI value, and the service direction. Also,in some embodiments, the STL attributes includes the original L2 sourceMAC address, the original L2 destination MAC address, the data messagedirection, and the service-plane source MAC address of the source GVM.In some embodiments, the service direction and the service-plane sourceMAC address are already part of the SMD attributes. The servicetransport layer in some embodiments needs these attributes with eachprocessed data message, in order to recreate the original data messageand later at the end of the service-path, to return the data message tothe original host to resume along its datapath.

When the encapsulated data message is received at the next hop's hostcomputer, the data message is captured by the encap processor 628 of thesoftware switch's downlink port that connects to the VTEP that receivedthe data message from the prior hop's VTEP. This encap processor removesthe encapsulation header from the data message and stores the STL andSMD attributes as the set of attributes of the data message. It thenpasses the decapsulated message to the downlink port, which then passesit to the software switch to forward to the next hop's switch port. Fromthere the data message is processed by the STL module and service proxybefore reaching the service node, as described above.

When the service proxy determines that the decremented SI value is zero,the service proxy matches the decremented SI value and the embedded SPIvalue with a rule that directs the service proxy to identify the nexthop as the service plane MAC address of the source GVM. In someembodiments, this determination is not specified by a forwarding entryof a forwarding table, but rather is hard coded into the logic of theservice proxy. Hence, when the SI value is zero, the proxy provides thesource GVM's service plane MAC address to its associated STL module 626to use to forward the data message back to the GVM's host computer. TheSTL module then defines the message's destination MAC (DMAC) address asthe source GVM's service plane MAC address while defining the message'ssource MAC (SMAC) address as the service plane MAC address associatedwith its service node (e.g., the service plane MAC of the softwareswitch's port associated with the service node). It also stores theoriginal SMAC and DMAC of the data message in the attribute set of thedata message.

The STL module then passes the formatted data message and its attributesalong the egress path, where it reaches it associated software switchport. The software switch then passes this message to its uplink port.The encap processor 628 of this port then captures this data message,defines an encapsulating overlay header for this data message andencapsulates the data message with this overlay header. As mentionedabove, this overlay header is a Geneve header that stores the SMD andSTL attributes in one or more TLVs. This encapsulated data message thentraverses the overlay network to reach the source GVM's host computer,where this data message is decapsulated by the downlink port's encapprocessor, and is then provided to the software switch, which thenforwards it to the port proxy.

Once the port proxy 620 receives the decapsulated data message, itidentifies the GVM associated with this data message from the originalsource MAC address that is now part of the decapsulated data message'sstored attributes. In some embodiments, the port proxy has a record thatmaps the original source MAC address and service direction in the SMDattributes of a received data to a GVM on its host (e.g., to a softwareswitch port associated with a guest forwarding plane and a GVM on itshost). The port proxy then formats the data message to include itsoriginal SMAC and DMAC and provides the data message back to the sourceGVM's IO chain. The SI post-processor 612 in this IO chain thenprocesses this data message, before returning this data message to theegress datapath of the GVM. The operations of this post-processor willbe further described below.

One of ordinary skill will realize that the service insertion layer andservice transport layer in other embodiments are implemented differentlythan the exemplary implementations described above. For instance,instead of using an L2 overlay (L2 transport layer) that relies on MACaddresses to traverse the different service hops, other embodiments usean L3 overlay (L3 transport layer) that uses L3 and/or L4 networkaddresses to identify successive service hops. Also, the above-describedservice insertion and/or transport modules can be configured to operatedifferently.

A more detailed example of the operations of the service insertion andservice transport layers will now be described by reference to FIGS.7-16 . FIG. 7 illustrates a process 700 performed by the SIpre-processor 610 and STL caller 624 of some embodiments. This processis described below by reference to the data flow example illustrated inFIG. 8 . The process 700 starts when the SI pre-processor 610 is calledto analyze a data message that is sent along the ingress or egressdatapath of a GVM.

As shown, the process 700 initially determines (at 705) whether thepre-processor 610 has previously selected a service chain and a servicepath for the data message's flow and stored the SMD attributes for theselected service chain and path. In some embodiments, the process 700makes this determination by using the data message's attributes (e.g.,its five tuple attributes) to try to identify a record for the message'sflow in a connection tracker that stores records of message flows forwhich service chains and paths were previously selected, and SMDattributes were previously stored for these chains and paths in theconnection tracker records.

FIG. 8 illustrates the pre-processor 610 receiving a data message 802along the egress datapath of the GVM 102. It also shows thepre-processor initially checking a connection tracking storage 804 totry to find a connection record that has a flow identifier (e.g., afive-tuple identifier) that matches a set of attributes (e.g., fivetuple attributes) of the received data message. In this example, thepre-processor 610 cannot find such a connection record as the receiveddata message is the first data message for its flow.

When the process 700 determines (at 705) that the connection storage 804has a connection record that matches the received data message, theprocess retrieves (at 710) the SMD attributes from this record, or fromanother record referenced by the matching connection record. The SMDattributes in some embodiments include the SCI, SPI, SI and directionvalues. From 710, the process transitions to 740, which will bedescribed below.

On the other hand, when the process 700 determines (at 705) that theconnection storage 804 does not have a connection record that matchesthe received data message, the process performs (at 715) aclassification operation that tries to match the data message to aservice insertion rule in a SI rule storage, which is illustrated inFIG. 8 as storage 806. In some embodiments, the SI rule storage 806stores service insertion rules 822 that have rule identifiers defined interms of one or more data message flow attributes (e.g., one or more ofthe five tuple attributes or portions thereof). Each service rule alsospecifies a SCI that identifies a service chain that is applicable todata message flows that match the rule identifier of the service rule.

At 720, the process determines whether the classification operationmatches the data message's attributes to the rule identifier of aservice insertion rule that requires a service chain to be performed onthe data message. When the classification operation does not identify aservice insertion rule that requires a service chain to be performed onthe data message, the process 700 ends. In some embodiments, the SI rulestorage 806 has a default low priority rule that matches any datamessage when the data message's attributes do not match any higherpriority SI rule, and this default low priority rule specifies that noservice chain has been defined for the data message's flow. No servicechain is defined for a data message flow in some embodiments when noservice operations needs to be performed on the data message flow.

On the other hand, when the classification operation matches the datamessage's attributes to the rule identifier of a service insertion rulethat requires a service chain to be performed on the data message, theprocess 700 performs (725) a path selection operation to select aservice path for the service chain specified by the service insertionrule identified at 715. As shown in FIG. 8 , the pre-processor 610performs a path-selection operation by examining a path storage table808 that identifies one or more service paths for each service chainidentifier.

Each service path is specified in terms of its SPI value. When multipleservice paths are specified for a service chain, the path storage 808stores for each service chain a set of selection metrics 820 forselecting one SPI from the available SPIs. Different embodiments usedifferent selection metrics. For instance, some embodiments use aselection metric that costs a service path based on the number of hostson which the service nodes of the service path execute. In otherembodiments, these selection metrics are weight values that allow thepre-processor to select SPIs for a service chain in a load balancedmanner that is dictated by these weight values. For instance, in someembodiments, these weight values are generated by a central controlplane based on the load on each of the service nodes in the service pathand/or based on other costs (such as number of hosts traversed by theservice path, etc.).

In some of these embodiments, the pre-processor maintains a record ofprevious selections that it has made for a particular service chain, andselects subsequent service paths based on these previous selections. Forexample, for four service paths, the weight values might be 1, 2, 2, 1,which specify that on six successive SPI selections for a service chain,the first SPI should be selected once, the second and third SPIs shouldthen be selected twice each, and the fourth SPI should be selected one.The next SPI selection for this service chain will then select the firstSPI, as the selection mechanism is round robin.

In other embodiments, the weight values are associated with a numericalrange (e.g., a range of hash values) and a number is randomly ordeterministically generated for each data message flow to map the datamessage flow to a numerical range and thereby to its associated SPI. Instill other embodiments, the hosts LCP selects one service path for eachservice chain identifier from the pool of available service paths, andhence stores just one SPI for each SCI in the path table 808. The LCP inthese embodiments selects the service path for each service chain basedon costs (such as the number of hosts traversed by each service pathand/or the load on the service nodes of the service paths).

After identifying a service path for the identified service chain, theprocess 700 next identifies (at 730) the network address for the firsthop of the selected service path. In some embodiments, the MAC addressfor this hop is stored in the same record as the selected path's SPI.Hence, in these embodiments, this MAC address is retrieved from the pathselection storage 808 with the selected SPI. In other embodiments, thepre-processor retrieves the first hop's MAC address from an exact matchforwarding table 810 that stores next hop network addresses forassociated pairs of SPI/SI values, as shown in FIG. 8 . In someembodiments, the initial SI values for the service chains are stored inthe SI rules of the SI rule storage 806, while in other embodiments,these initial SI values are stored with the SPI values in that pathtable 808.

At 735, the process 700 specifies the SMD attributes for the datamessage, and associates these attributes with the data message. Asmentioned above, the SMD attributes include in some embodiments the SCI,the SPI, SI and direction values. The service directions for servicepaths are stored with the SPI values in the path table 808 as thedirections through the service chains are dependent on the servicepaths. Also, as mentioned below, a service chain in some embodiments hasto be performed in a forward direction for data messages from a firstGVM to a second GVM, and then in the reverse direction for data messagesfrom the second GVM to the first GVM. For such service chains, thepre-processor 610 selects both the service path for the forwarddirection and the service path for the reverse direction when itprocesses the first data message in the flow from the first GVM to thesecond GVM.

After the SI pre-processor completes its operation, the STL caller 624in the same datapath calls (at 740) the STL port proxy 620 to relay theSMD attributes and first hop's network address that the pre-processoridentified, so that the port proxy can forward the SMD attributesthrough the service plane to the first hop. The operation of the portproxy 620 as well as other modules in the service insertion layers andservice transport layers will be described by reference to FIGS. 9-16 .These figures describe an example of processing the data message fromGVM 102 through a service path that includes the SVM 106, then SVM 108and then SVM 110.

In these figures, each GVM is a compute machine of a tenant in amulti-tenant datacenter, and connects to the software switch through aswitch port that is associated with a guest VNI (GVNI) of the tenant.Also, in these figures, each SVM is a service machine for processing theGVM message traffic, and connects to the software switch through aswitch port that is associated with a service VNI (SVNI) of the tenant.As mentioned above and further described below, some embodiments use theGVNI for performing the guest logical forwarding operations (i.e., forestablishing a guest logical forwarding element, e.g., a logical switchor router, or a guest logical network) for the tenant, while using theSVNI for performing the service logical forwarding operations for thetenant (i.e., for establishing a service logical forwarding element,e.g., a logical switch or router, or a service logical network).

Both of these logical network identifiers (i.e., the GVNI and SVNI) aregenerated for the tenant by the management or control plane in someembodiments. The management or control plane of some embodimentsgenerates different GVNIs and SVNIs for different tenants such that notwo tenants have the same GVNI or SVNI. In some embodiments, each SVM isdedicated to one tenant, while in other embodiments, an SVM can be usedby multiple tenants. In the multi-tenant situation, each SVM can connectto different ports of different service planes (e.g., different logicalswitches) for different tenants.

As shown in FIG. 9 , the port proxy 620 formats the data message forforwarding to the first service node, by replacing the original sourceand destination MAC addresses in the data message with a service planeMAC address that is associated with the source GVM 102 and the MACaddress of the first hop service node. This operation is depicted asoperation 1005 in the process 1000 of FIG. 10 . This process 1000 is aprocess that the port proxy 620 or STL module 626 starts whenever an SImodule (such as an SI pre-processor 610 or a SI proxy 614) is doneprocessing a data message.

In this process 1000, the port proxy also adds (at 1010) the originalsource and destination MAC addresses of the data message to the set ofattributes for the data message that should be processed by otherservice transport layer modules (e.g., the vswitch, other STL modules,the encap processor, etc.) on the same host computer. The reformatteddata message 902 and the augmented attributed set 904 are depicted inFIG. 9 .

After reformatting the data message and augmenting its attribute set,the port proxy 620 passes (at 1015) the formatted data message alongwith its stored attribute set along its egress path where it reaches thesoftware switch 120. Based on the destination MAC address (e.g., thefirst hop MAC address) of the formatted data message, the softwareswitch determines (at 1020) whether the next hop's port is local. Thisis the case for the example illustrated in FIG. 9 . Hence, the softwareswitch delivers (at 1025) the data message to the switch port associatedwith the first hop SVM 106. This port then sends the data message alongthe SVM's ingress path, where the data message 902 and its augmentedattribute set 904 is identified by the STL module 626 through a functioncall of the ingress IO chain of the first hop's SVM, as shown in FIG. 9.

This STL module 626 then re-formats (at 1030) the data message byreplacing the GVM's service plane MAC address and the first hop MACaddress (i.e., the MAC address of SVM 106) with the original source anddestination MAC addresses of the data message, which it retrieves fromthe augmented attribute set 904. In retrieving the original SMAC andDMAC addresses, the STL module 626 modifies the data message's attributeset. The reformatted data message 906 and the modified attributed set908 are depicted in FIG. 9 . The STL module then passes thisre-formatted data message with its accompanying SMD attributes along theSVM's ingress path, where it is next processed by this hop's ingressservice proxy 614.

FIG. 11 illustrates a process 1100 that the service proxy 614 performsin some embodiments each time it receives a data message traversingalong the ingress path of a service node. As shown, the service proxyinitially makes (at 1105) a copy of the data message if necessary. Forinstance, in some embodiments, the service node only needs to receive acopy of the data message to perform its operations. One example of sucha service node would a monitoring SVM that needs to obtain a datamessage copy for its message monitoring or mirroring operation.

In these embodiments, the service proxy copies the data messages andperforms the remaining operations 1110-1125 with respect to this copy,while passing the original data message to the next service hop or backto the source GVM. To forward the original data message to the nextservice hop or back to the GVM, the service proxy has to perform anext-hop lookup based on the SPI/SI values and then provide the next-hopaddress (e.g., the next service hop's address or the service plane MACof the source GVM) to the STL module to forward. These look up andforwarding operations are similar to those described below by referenceto FIGS. 12-14 .

Next, at 1110, the service proxy sets a liveness attribute in the storedSMD attribute set of the data message (which, in some embodiments, mightbe the data message copy at this point). This liveness attribute is avalue that directs the service node to provide a responsive livenessvalue (the same value or related value) with the data message once ithas processed the data message. With this liveness attribute, theservice proxy also provides a sequence number, which the service nodehas to return, or increment and then return, with the responsiveliveness value, as described above.

At 1115, the service proxy formats the data message, if necessary, toput it in a form that can be processed by the service node. Forinstance, when the service node does not know the current next hop MACthat is set as the destination MAC of the data message, the serviceproxy changes the destination MAC of the message to a destination MACassociated with the service node.

After formatting the data message to sanitize it for forwarding to theservice node, the service proxy 614 encapsulates (at 1120) the datamessage with one of three encapsulation headers that it can beconfigured to use, and passes (at 1125) the encapsulated message alongthe service node's ingress path so that it can be forwarded to theservice node. FIG. 9 illustrates the encapsulated data message 920passing from the service proxy to the SVM 106 with a native NSHencapsulation header. As shown, the encapsulating header 922 includesthe service chain identifier, the service index, service chain directionand liveness signal. Numerous techniques for applying and usingencapsulating headers are described in U.S. patent application Ser. No.16/444,826, which is incorporated herein by reference. Any of thesetechniques can be used in conjunction with the embodiments describedherein.

The SVMs of some embodiments perform service operations on datamessages, then encapsulate the message with an encapsulating header andprovide the encapsulated message to the service proxy. Various serviceoperations performed by SVMs of some embodiments are described in U.S.patent application Ser. No. 16/444,826, which is incorporated herein byreference. Any of these operations can be performed by SVMs inconjunction with the embodiments described herein.

In some embodiments, the SVM can also set flow programming attribute(s)in the encapsulating header to direct the service proxy to modify theservice processing of the data message's flow. This flow programmingwill be further described below. After encapsulating the data message,the SVM forwards the data message along its egress path. FIG. 12illustrates an example of SVM 106 returning the encapsulated datamessage 1202 with the SMD and liveness attributes in its encapsulatingheader 1204.

FIG. 13 illustrates a process 1300 that the service proxy 614 performsin some embodiments each time it receives a data message traversingalong the egress path of its service node. As shown, the service proxyin some embodiments initially (at 1305) removes the encapsulation headerfrom the data message, removes the SMD attributes from this header, andstores these attributes in an attribute set that it creates for the datamessage. In some embodiments, the service proxy retrieves (at 1305) someor all of the SMD attributes (e.g., the SPI value, the service plane MACaddress of the source GVM) for the data message from a previous recordthat the service proxy created before giving the data message to theservice node along the ingress path. FIG. 12 illustrates an example ofthe attribute set 1206 that the service proxy 614 creates for thedecapsulated data message 1207.

Next, 1310, the process resets the liveness timer (e.g., a timer thatexpires every 0.25 seconds) that it maintains to account for theliveness value that it has received from the service node, whichsignifies that this node is still operational. With this liveness value,the service proxy receives from the service node a sequence number,which the process validates to ensure that it is the next liveness valuethat needs to be received.

At 1315, the process determines whether the SVM specified any flowprogramming attribute(s), which require the service proxy to direct theSI post processor 612 for the source GVM to perform flow programming bysending to the post processor 612 in-band data messages. In someembodiments, the service proxy sends an in-band flow programming controlsignal with another data message that it generates to send back to thesource GVM, where it will be intercepted by its post processor 612.

When the source GVM receives the data message with the flow programmingcontrol signal, its post processor can uniquely identify the datamessage flow to which it applies by using a flow identifier that isunique to this flow. As further described below, this flow identifier isderived partially based on a unique identifier of the source GVM. Theunique flow identifier also allows other service plane modules, such asthe service nodes, service proxies and STL modules, to uniquely identifyeach data message flow. This unique flow identifier in some embodimentsis part of the SMD attributes that are passed between the service hopsof a service path and passed back to the source GVM.

In some embodiments, however, the service proxy sends the in-band flowprogramming control signal with the current data message that it isprocessing. In some of these embodiments, the service proxy does thisonly when its associated service node is the last hop service node ofthe service path, while in other embodiments it does this even when itsservice node is not the last hop service node. When its service node isnot the last hop service node of the service path, the service proxyembeds the flow programming in the SMD attributes of the data message,which in some embodiments eventually get forwarded to the source GVM'sSI post processor as part of the data message encapsulation header whenthe last hop service is performed. Even in this situation, the serviceproxy of the last hop in other embodiments sends the flow programmingsignal as a separate message.

The flow programming signals will be further described below byreference to FIG. 17 . Also, as further described below, the serviceproxy also sends flow programming signals back to the source GVM when itdetects that its service node has failed so that the classifier at thesource GVM can select another service path for the current data messageflow, as well as other data message flows. In such a situation, theservice proxy also notifies the LCP on its host computer, so that theLCP can notify the CCP and the CCP, in turn, can modify the servicepaths specified for service chains that use the failed service node.

At 1320, the process 1300 determines whether its service node specifiedthat the data message should be dropped. If so, the process drops thedata message and then ends. Otherwise, assuming the data message shouldnot be dropped and should continue along its service path, the serviceproxy in some embodiments decrements (at 1325) the SI value in case theservice node has not decremented the SI value, and then uses (at 1330)this decremented value along with the SPI value in the data message'sstored attribute set to identify an exact match forwarding rule thatidentifies a next hop network address. When the proxy's service node ison multiple service paths, the proxy's forwarding rule storage storesmultiple exact match forwarding rules that can specify different nexthop network addresses for different SPI/SI values.

When the decremented SI value is zero, the service proxy in someembodiments that matches the decremented SI value and the embedded SPIvalue with a rule that directs the service proxy to identify the nexthop as the service plane MAC address of the source GVM. This rule insome embodiments does not provide a MAC address, but rather refers tothe service plane MAC address that is part of the SMD attribute setstored for the data message. In some embodiments, this instructions forreturning the data message to the service plane MAC address of thesource GVM when the SI value is zero is not specified by a forwardingentry of a forwarding table, but rather is hard coded into the logic ofthe service proxy.

At 1330, the service proxy stores the next hop network address (e.g.,MAC address) in the attribute set that is stored for the data message.FIG. 12 illustrates an example of the service proxy 614 storing the nexthop MAC address associated with the next service node in the attributeset 1206 of the decapsulated data message 1207. After identifying thenext hop network address, the service proxy returns (at 1335) the datamessage to the egress path of its service node, and the process 1300ends.

Once the service proxy returns the data message to the service node'segress path, the STL module 626 receives this data message and commencesthe process 1000 of FIG. 10 . The STL module 626 performs the firstthree operations 1005-1015 of this process each time it receives a datamessage from a service insertion layer. Specifically, the STL moduleformats (at 1005) the data message for forwarding to the next hopservice node, by replacing the original source and destination MACaddresses in the data message with a service plane MAC addresses of thecurrent service hop and the next service hop (i.e., the hop1mac andhop2mac addresses in the example illustrated in FIG. 12 ).

At 1010, the STL module also adds the original source and destinationMAC addresses of the data message to the set of attributes for the datamessage that should be processed by other service transport layermodules (e.g., the vswitch, the encap processor, etc.) on the same hostcomputer. The reformatted data message 1208 and the augmented attributedset 1210 are depicted in FIG. 12 . After reformatting the data messageand augmenting its attribute set, the STL module 626 passes (at 1015)the formatted data message along the egress path, where it next reachesthe software switch 120.

Based on the destination MAC address (i.e., the next hop MAC address) ofthe formatted data message, the software switch determines (at 1020)that the next hop's port is not local. Hence, the software switchprovides (at 1035) the data message to the uplink port 1250 thatconnects to a VTEP1 that communicates through an overlay network tunnelwith a VTEP2 on host 114, as illustrated in the example of FIG. 12 . Asshown, an STL encap processor 628 along the egress path of this uplinkport (at 1040) receives this data message (e.g., is called as one of thehooks specified for the uplink port), defines an encapsulating overlayheader 1240 for this data message and encapsulates the data message withthis overlay header.

In some embodiments, the overlay header is a Geneve header that storesthe SMD and STL attributes in one or more of its TLVs. As mentionedabove, the SMD attributes in some embodiments include the SCI value, theSPI value, the SI value, and the service direction. Also, in someembodiments, the STL attributes includes the original L2 source MACaddress and the original L2 destination MAC address. FIG. 12 illustratesan example of this encapsulating header.

When the encapsulated data message is received at the next hop's hostcomputer 114, the data message is captured by the STL encap processor628 of (e.g., defined as a hook for) a downlink port 1252 that connectsto the VTEP connecting through the overlay network tunnel to the priorhop's VTEP. FIG. 14 illustrates a process 1400 started by an encapprocessor 628 on a next hop computer that receives an encapsulated datamessage that needs to be processed by an SVM executing on its computer.

As shown, this encap processor removes (at 1405) the encapsulationheader from the data message, and stores (at 1405) the STL and SMDattributes as the associated set of attributes of the data message. Itthen passes (at 1410) the decapsulated message to the downlink port,which then passes it to the software switch to forward (at 1415) to itsport that is connected to the next hop SVM (i.e., that is associatedwith the destination MAC address). This port then passes the datamessage 1208 and the attribute set 1210 to the ingress path of the nexthop SVM, as shown in the example of FIG. 12 for the SVM 108.

The STL module 626 on this ingress path then re-formats (at 1420) thedata message by replacing the previous and current hop service plane MACaddress (i.e., the hop1mac and hop2mac) with the original source anddestination MAC addresses of the data message, which it retrieves fromthe data message attribute set. In retrieving the original SMAC and DMACaddresses, the STL module 626 modifies the data message's attribute set.The reformatted data message 1230 and the modified attributed set 1232are depicted in FIG. 12 . The STL module then passes this re-formatteddata message with its accompanying SMD attributes along the SVM'singress path, where it is next processed by this hop's ingress serviceproxy 614.

The operation of this service proxy is as described above by referenceto FIGS. 9 and 11 . FIG. 12 shows the service proxy of SVM 108 on host114 passing an encapsulated data message to the SVM. The encapsulatingheader of this data message is supported by the SVM 108 and stores theSCI, SI, service direction and liveness values. In some embodiments, theSVMs that are part of the same service path support differentencapsulating headers. In some of these embodiments, the service proxiesalong a service path can encapsulate the data message with differentencapsulating headers before passing the data message to theirassociated SVMs. For instance, in one case, the first hop service proxypasses to the SVM 106 the data message with an NSH encapsulating header,while the second hop service proxy passes to the SVM 108 the datamessage with a QinQ encapsulating header.

Once the SVM 108 performs its service operation on the data message, theSVM sends the processed data message along its egress data path, asshown in FIG. 15 . As shown, the service proxy then identifies the MACaddress of the next service hop and adds this MAC address to the storedattribute set for the data message. At this point, the next hop is thethird service hop, which corresponds to the SVM 110. This proxyidentifies this MAC by decrementing the SI value (when the SVM 108 didnot decrement the SI value) and then using the embedded SPI value anddecremented SI value to lookup a forwarding rule that provides the nexthop's MAC address. The STL module in this egress path then replaces theoriginal SMAC and DMAC in the data message with the current hop and nexthop MAC addresses (i.e., the hop2mac and the hop3mac in the example ofFIG. 15 ), stores the original SMAC and DMAC in the stored attribute setof the data message, and then passes the data message along the egresspath where it is received by the software switch 122.

The software switch then determines that the next hop is associated withits uplink port 1252, and hence passes the data message to this port. Asshown in FIG. 15 , the encap processor 628 on the egress path of thisport (e.g., specified as a hook on this egress path) then encapsulatesthe data message with a Geneve header that stores the SMD and STLattributes in one or more of TLVs and specifies that the data message istraversing from this port's associated VTEP2 to VTEP3 that is associatedwith port 1254 of host 116.

The STL encap processor 628 in the ingress path of port 1254 thenremoves the encapsulation header from the data message and stores theSTL and SMD attributes as the associated set of attributes of the datamessage. It then passes the decapsulated message to the port 1254, whichthen passes it to the software switch 124 to forward to its portconnected to the next hop SVM 110 (i.e., to its port associated with theservice plane DMAC). This port then passes the data message andattribute set to the ingress path of this SVM, as shown in FIG. 15 .

The STL module 626 in this ingress path replaces the previous andcurrent hop service plane MAC address (i.e., the hop2mac and hop3mac)with the original source and destination MAC addresses of the datamessage, which it retrieves from the data message attribute set. The STLmodule 626 also modifies the data message's attribute set by removingthe original SMAC and DMAC addresses, and then passes the re-formatteddata message with its accompanying SMD attributes along the SVM'singress path for this hop's ingress service proxy 614 to process. Thisservice proxy passes to the SVM 110 an encapsulated data message with anencapsulating header supported by the SVM 110 and storing the SCI, SI,service direction and liveness values.

Once the SVM 110 performs its service operation on this data message,the SVM sends the processed data message along its egress data path, asshown in FIG. 16 . The service proxy decrements the SI value whenassuming that the SVM 110 has not done so already. In this example, thedecremented SI value is now zero. In some embodiments, the service proxythen matches this SI value and the SPI value to a rule identifier of aforwarding rule that specifies that it should select the service planeMAC (spmac) of the source GVM as the next hop MAC address. In otherembodiments, the hardcoded logic of the service proxy directs it toidentify the service plane MAC of the source GVM as the next hop MAC. Ineither case, the service proxy adds the source GVM's service plane MACto the attribute set of the data message.

The STL module next replaces the original SMAC and DMAC in the datamessage with the third hop MAC address and the source GVM's serviceplane MAC, stores the original SMAC and DMAC in the stored attribute setof the data message, and then passes the data message to its softwareswitch 124. The software switch then determines that the next hop isassociated with its port 1254, and hence passes the data message to thisport. As shown in FIG. 16 , the encap processor 628 on the egress pathof this port then encapsulates the data message with a Geneve headerthat stores the SMD and STL attributes in one or more TLVs and specifiesthat the data message is traversing from this port's associated VTEP3 toVTEP1 that is associated with port 1250 of host 112.

The STL encap processor 628 in the ingress path of port 1250 thenremoves the encapsulation header from the data message and stores theSTL and SMD attributes as the associated set of attributes of the datamessage. It then passes the decapsulated message to the port 1250, whichthen passes it to the software switch 120 to forward to its portconnected to the port proxy 620. This port then passes the data messageand attribute set to the port proxy 620, as shown in FIG. 16 .

The port proxy 620 then replaces the previous and current hop serviceplane MAC address (i.e., the hop3mac and spmac) with the original sourceand destination MAC addresses of the data message, which it retrievesfrom the data message attribute set. The port proxy 620 also modifiesthe data message's attribute set to remove the original SMAC and DMAC,and then passes this re-formatted data message with its accompanying SMDattributes back to the STL caller 624 that called it in the first place.In some embodiments, the port proxy uses a connection record that itcreated when the STL caller originally called it, to identify the STLcaller to call back. In other embodiments, the port proxy uses a mappingtable that maps each service plane MAC with a GVM's STL caller. Themapping table in some embodiments has records that associate serviceplane MACs and service directions with guest forwarding plane portidentifiers associated with the GVMs.

Once called, the STL caller passes the data message along the egresspath of GVM 102, where it will next be forwarded to the SIpost-processor 612. FIG. 17 illustrates a process 1700 that the SIpost-processor 612 performs in some embodiments. The post-processorperforms this process 1700 each time it receives a data message that ispassed to it along a GVM's IO chain. As shown, the post processor 612 insome embodiments initially determines (at 1705) whether it needs toexamine the received data message for SI post processing. This isbecause as a module along a GVM's IO chain, the post processor will getcalled for all data message flows that pass along this IO chain and someof these data message might not match an SI rule that requires serviceinsertion operations to be performed on them. In some embodiments, theprocess 1700 determines (at 1705) whether it needs to process the datamessage by determining whether the data message has associated servicemetadata. If not, the process transitions to 1720, which will bedescribed below.

When the SI post processor 612 determines that it needs to process thedata message, the process determines (at 1710) whether the SMD metadataassociated with the data message specifies a flow programming tag thatrequires the post processor to perform a flow programming operation. Insome embodiments, such a flow programming tag would be specified in thedata message's SMD attributes by a service node to change the servicepath processing at the source GVM, or by a service proxy for the samereason when it detects failure of its service node. When the flowprogramming tag does not specify any flow programming, the processtransitions to 1720, which will be described below.

Otherwise, when the flow programming tag specifies a flow programmingoperation, the process 1700 performs this operation, and thentransitions to 1720. The flow programming operation entails in someembodiments modifying the connection record in the connection trackingstorage 804 to specify the desired operation and/or SMD attributes(e.g., allow, drop, etc.) for the data message's flow. The postprocessor's writing to the connection tracker 804 is depicted in FIG. 16. As mentioned above and further described below, the SMD metadata forthe processed data message includes a flow identifier that uniquelyidentifies the data message's flow by being at least partially derivedfrom the unique service plane identifier of the source GVM. The postprocessor 612 uses this flow identifier to match the data message's flowin the connection tracker in some embodiments.

In some embodiments, the flow programming tag can specify the followingoperations (1) NONE when no action is required (which causes no flowprogramming operation to be performed), (2) DROP when no further datamessages of this flow should be forwarded along the service chain andinstead should be dropped at the source GVM, (3) ACCEPT when no furtherdata messages of this flow should be forwarded along the service chainand instead the flow should be accepted at the source GVM. In someembodiments, the flow programming tag can also specify DROP_MESSAGE. TheDROP_MESSAGE is used when the service node needs to communicate with theproxy (e.g. to respond to a ping request) and wants the user datamessage (if any) to be dropped, even though no flow programming at thesource is desired.

In some embodiments, an additional action is available for the serviceproxies to internally communicate failure of their SVMs. This actionwould direct the SI post processor in some embodiments to select anotherservice path (e.g., another SPI) for the data message's flow. Thisaction in some embodiments is carried in-band with a user data messageby setting an appropriate metadata field in some embodiments. Forinstance, as further described below, the service proxies communicatewith the post processor of the source GVM through OAM (Operation,Administration, and Maintenance) metadata of the NSH attributes throughin-band data message traffic over the data plane. Given that by designflow programming actions are affected by signaling delays and aresubject to loss, an SVM or service proxy might still see data messagesbelonging to a flow that was expected to be dropped, accepted orre-directed at the source for some time after communicating the flowprogramming action to the proxy. In this case, the service plane shouldcontinue set action to drop, allow or redirect at the source.

The process 1700 transitions to 1720 after completing the flowprogramming operation. It also transitions to 1720 when it determines(at 1705) that no SI post processing needs to be performed on the datamessage or determines that no flow programming needs to be performed forthis data message. At 1720, the process 1700 lets the data messagethrough the egress path of GVM 102, and then ends.

The examples described above by reference to FIGS. 8, 9, 12, 15, and 16show service plane operations that are performed on a data message thatis identified along the egress path of a source GVM. These service planeoperations (described by reference to FIGS. 7, 10, 11, 13, 14 and 17 )are equally applicable to data messages that are identified as theytraverse along the ingress path of a source GVM. To perform theseingress side operations, the SI pre and post processors 610 and 612 onthe ingress path are flipped as compared to the locations of these twoprocessors on the egress path. Specifically, as shown in FIG. 6 , thepreprocessor 610 receives a data message that enters the GVM's ingresspath from the software switch port that is associated with this GVM'sVNIC, while the post processor 612 passes the processed data messagealong the ingress IO chain to the GVM's VNIC.

However, the service insertion and service transport operations for theingress side processing are similar to the egress side processing ofdata messages to and from a particular GVM. In some cases, this GVMexchanges data messages with another GVM. As described above byreference to FIGS. 4 and 5 , the service plane can be directed toperform the same service chain on the data messages in each direction,but in the opposite order. In such cases, the service nodes for theservice path on the ingress side perform a series of service operationsfor a first direction of the service chain for data messages that theother GVM sends to the particular GVM, while the service nodes for theservice path on the egress side perform the same series of serviceoperations but in a second, opposite direction through the servicechain. Also, as mentioned above, the two sets of service nodes for theforward and reverse directions include the same service nodes in someembodiments.

FIG. 18 conceptually illustrates several operations that the networkmanagers and controllers perform in some embodiments to define rules forservice insertion, next service hop forwarding, and service processing.As shown, these operations are performed by a service registrator 1804,a service chain creator 1806, a service rule creator 1808, a servicepath generator 1810, a service plane rule generator 1812, and a ruledistributor 1814. In some embodiments, each of these operators can beimplemented by one or more modules of a network manager or controllerand/or can be implemented by one or more standalone servers.

Through a service partner interface 1802 (e.g., a set of APIs or apartner user interface (UI) portal), the service registrator 1804receives vendor templates 1805 that specify services that differentservice partners perform. These templates define the partner services interms of one or more service descriptors, including service profiles.The registrator 1804 stores the service profiles in a profile storage1807 for the service chain creator 1806 to use to define service chains.

Specifically, through a user interface 1818 (e.g., a set of APIs or a UIportal), the service chain creator 1806 receives from a networkadministrator (e.g., a datacenter administrator, a tenant administrator,etc.) one or more service chain definitions. In some embodiments, eachservice chain definition associates a service chain identifier, whichidentified the service chain, with an ordered sequence of one or moreservice profiles. Each service profile in a defined service chain isassociated with a service operation that needs to be performed by aservice node. The service chain creator 1806 stores the definition ofeach service chain in the service chain storage 1820.

Through the user interface 1818 (e.g., a set of APIs or a UI portal),the service rule creator 1808 receives from a network administrator(e.g., a datacenter administrator, a tenant administrator, etc.) one ormore service insertion rules. In some embodiments, each serviceinsertion rule associates a set of data message flow attributes with aservice chain identifier. The flow attributes in some embodiments areflow header attributes, like L2 attributes or L3/L4 attributes (e.g.,five tuple attributes). In these or other embodiments, the flowattributes are contextual attributes (e.g., AppID, process ID, activedirectory ID, etc.). Numerous techniques for capturing and usingcontextual attributes for performing forwarding and service operationsare described in U.S. patent application Ser. No. 15/650,251, which areincorporated herein. Any of these techniques can be used in conjunctionwith the embodiments described herein.

The service rule creator 1808 generates one or more service insertionrules and stores these rules in the SI rule storage 1822. In someembodiments, each service insertion rule has a rule identifier and aservice chain identifier. The rule identifier in some embodiments can bedefined in terms of flow identifiers (e.g., header attributes,contextual attributes, etc.) that identify data message flow(s) to whichthe SI rule is applicable. The service chain identifier of each SI rule,on the other hand, identifies the service chain that has to be performedby the service plane for any data message flow that matches the ruleidentifier of the SI rule.

For each service chain that is part of a service rule, the service pathgenerator 1812 generates one or more service paths, with each pathidentifying one or more service instance endpoints for one or moreservice nodes to perform the service operations specified by the chain'ssequence of service profiles. In some embodiments, the process thatgenerates the service paths for a service chain accounts for one or morecriteria, such as (1) the data message processing load on the servicenodes (e.g., SVMs) that are candidate service nodes for the servicepaths, (2) the number of host computers crossed by the data messages ofa flow as they traverse each candidate service path, etc.

The generation of these service paths is further described in U.S.patent application Ser. No. 16/282,802, which is incorporated herein byreference. As described in this patent application, some embodimentsidentify the service paths to use for a particular GVM on a particularhost based on one or more metrics, such as host crossing count(indicating how many times a data message traversing the service pathcrosses hosts), a locality count (indicating how many of the SIRs alongthis path are located on the local host), etc. Other embodimentsidentify service paths (i.e., select service nodes for service paths)based on other metrics, such as financial and licensing metrics.

The service path generator 1812 stores the identity of the generatedservice paths in the service path storage 1824. This storage in someembodiments associates each service chain identifier to one or moreservice path identifiers, and for each service path (i.e., each SPI) itprovides a list of service instance endpoints that define the servicepath. Some embodiments store the service path definitions in one datastorage, while storing the association between the service chain and itsservice paths in another data storage.

The service rule generator 1810 then generates rules for serviceinsertion, next service hop forwarding, and service processing from therules stored in storages 1820, 1822 and 1824, and stores these rules inrule storages 1826, 1828 and 1830, from where the rule distributor 1814can retrieve these rules and distribute them to the SI pre-processors,service proxies and service nodes. The distributor 1814 also distributesin some embodiments the path definitions from the service path storage1824. The path definitions in some embodiments includes the first hopnetwork address (e.g., MAC address) of the first hop along each path. Insome embodiments, the service rule generator 1810 and/or the ruledistributor 1814 specify and distribute different sets of service pathsfor the same service chain to different host computers, as differentsets of service paths are optimal or preferred for different hostcomputers.

In some embodiments, the SI classification rules that are stored in therule storage 1826 associate flow identifiers with service chainidentifiers. Hence, in some embodiments, the rule generator 1810retrieves these rules form the storage 1822 and stores them in theclassification rule storage 1826. In some embodiments, the ruledistributor 1814 directly retrieves the classification rules from the SIrule storage 1822. For these embodiments, the depiction of the SIclassification rule storage 1826 is more of a conceptual illustration tohighlight the three type of the distributed rules, along with thenext-hop forwarding rules and the service node rules.

In some embodiments, the service rule generator 1810 generates the nexthop forwarding rules for each hop service proxy of each service path foreach service chain. As mentioned above, each service proxy's forwardingtable in some embodiments has a forwarding rule that identifies the nexthop network address for each service path on which the proxy'sassociated service node resides. Each such forwarding rule maps thecurrent SPI/SI values to the next hop network address. The service rulegenerator 1810 generates these rules. For the embodiments in which theSI pre-processor has to look-up the first hop network address, theservice rule generator also generates the first hop look-up rule for theSI pre-processor.

Also, in some embodiments, the service rule generator 1810 generates forthe service nodes service rules that map service chain identifier,service index values and service directions to service profiles of theservice nodes. To do this, the service rule generator uses the servicechain and service path definitions from the storages 1820 and 1824, aswell as the service profile definitions from the service profile storage1807. In some embodiments, the rule distributor forwards the servicenode rules to a service node through a service manager of the servicenode when such a service manager exists. The service profile definitionsare also distributed by the distributor 1814 to the host computers(e.g., to their LCPs) in some embodiments, so that these host computers(e.g., the LCPs) can use these service profiles to configure theirservice proxies, e.g., to configure the service proxies to forwardreceived data messages to their service nodes, or to copy the receiveddata messages and forward the copies to their service nodes, whileforwarding the original received data messages to their next servicenode hops or back to their source GVMs when they are the last hops.

In some embodiments, the management and control plane dynamically modifythe service paths for a service chain, based on the status of theservice nodes of the service paths and the data message processing loadson these service nodes. FIG. 19 illustrates how service paths aredynamically modified in some embodiments. In these embodiments, acentral control plane 1900 works with a local control plane 1910 on thehost computers 1920 to define service paths for a service chain, and tomodify these service paths. The CCP 1900 in some embodiments is acluster of servers (e.g., three servers) that provide control planeoperations for defining configurations based on service rules specifiedby network administrators through a cluster of management servers thatprovide management operations.

As shown, the CCP has a status updater 1902 that receives service nodestatus data from status publishers 1903 on the host computers 1920. Asmentioned above, each time that a service proxy determines that itsassociated service node has failed (e.g., each time a service node failsto respond to the service proxy's liveness signal twice in a row), theservice proxy notifies the LCP 1910 of its host. The LCP then has itsstatus publisher 1903 notify the CCP's status updater 1902 of theservice node's failure.

The status updater 1902 relays any service node failures to the servicepath generator 1812, which in some embodiments is part of the CCP alongwith the SP rule generator 1810 and a statistic collector 1904. Eachtime a service node fails, the service path generator removes from theservice path storage 1824 its previously defined service paths that usethis service node. For each removed service path, the service pathgenerator 1812 deletes or deactivates the removed path's SPI value forthe service chain identifier of the corresponding service chain.

In some embodiments, each removed service path is removed (e.g., deletedor deactivated) from the records of all hosts that previously receivedforwarding rules or path definitions that were for this service path. Insome embodiments, the CCP (e.g., the service path generator 1810 or therule distributor 1814) directs these hosts to remove the service pathfrom the forwarding and path definition rules of their forwarding rulestorages 1928 and path definition storage 808. The LCP of the failedservice node in some embodiments removes the service path from itsforwarding and path definition rules, while in other embodiments eventhis LCP waits for instructions to do so from the CCP.

Each host 1920 also has a statistics publisher 1905 that publishes datamessage load statistics that the service proxies generate for theirservice nodes in some embodiments. Each time a service proxy receives adata message that has been processed by its service node, the serviceproxy in some embodiments increments statistics (e.g., data messagecount, byte count, etc.) that it maintains in a statistic storage 1907for its service node. In some embodiments, the statistics publisher 1905periodically or on-demand retrieves the collected statistics from thestorage 1907 and forwards these statistics to a statistic collector 1904of the CCP. In some embodiments, the statistics collector 1904 receives(through the management plane) statistics that the service managers ofthe service nodes receive from the service nodes.

The statistics collector 1904 relays the collected statistics to theservice path generator 1812. As mentioned above, the service pathgenerator in some embodiments defines the service paths through theservice nodes based in part on the data message load on the servicenodes. For instance, when the data message load on a service nodeexceeds a threshold value, the service path generator performs one ormore actions in some embodiments to reduce the load on this servicenode. For instance, in some embodiments, it stops adding the servicenode to any new service paths that it might define. In these or otherembodiments, it also directs the distributor 1814 to remove the servicepaths that use this service node from some or all of the hosts.

Conjunctively or alternatively, the service path generator directs a CCPmodule (e.g., the distributor 1814) to direct the LCPs of one or morehost computers to adjust the selection criteria 820 used for selectingservice paths that the LCPs generate in order to control how the SIpre-processor performs its path selections. In other embodiments, theservice path generator or another CCP module aggregates the loadstatistics for each service node and distributes the aggregated load tohost LCPs along with their associated SPI values so that the LCPs cananalyze these statistics and adjust the path selection criteria thatthey generate. In some embodiments, each LCP uses or has a pathevaluator 1915 to generate the path selection criteria to evaluate andselect paths based on service node statistics, and/or based on othercriteria, such as number of hosts traversed by each service path.

In some embodiments, the servers that implement the management plane,the control plane, the service managers are in the same datacenter asthe host computers on which the guest and service machines and modules(e.g., GVMs, SVMs, service proxies, port proxies, STL modules, SFEs,etc.) execute. In these embodiments, the management plane servers, thecontrol plane servers, the service managers and the host computermodules (e.g., the LCPs, SVMs, GVMs, hypervisor modules, etc.)communicate with each other through the shared network infrastructure(e.g., the switches, routers, wired and wireless links, etc.) of thedatacenter.

In other embodiments, the management plane servers, the control planeservers, the service managers and/or the host computers operate indifferent datacenters (e.g., enterprise private datacenters and publiccloud datacenters). In some such embodiments, management plane servers,the control plane servers, the service managers and/or the host computermodules (e.g., the LCPs, SVMs, GVMs, hypervisor modules, etc.)communicate with each other through network infrastructures outside oftheir respective datacenters. Also, some such embodiments implement theservice transport layer as a distributed logical L3 routers and/ornetwork that spans multiple datacenters (e.g., multiple privatedatacenters, multiple public datacenters, and/or multiple private/publicdatacenters).

In order to properly control and coordinate the above described servicechains, some embodiments provide a graphical user interface (GUI). TheGUI of some embodiments allows a user to select a particular servicechain. FIG. 20 illustrates an example of a GUI 2000 of some embodimentsthat displays the elements implementing a selected service chain. GUI2000 includes service nodes 2010A-2030B, links 2035, and path selectors2040A-2040C.

Each of the service nodes 2010A-2030B is a visual representation of asingle instantiation of a service. Service nodes 2010A-2010C representthe nodes implementing a first service operation of the service chain,service nodes 2020A-2020B represent the nodes implementing a secondservice operation, and service nodes 2030A-2030B represent the nodesimplementing a third service operation.

In the GUI 2000, between the various service nodes 2010A-2030B are links2035 connecting each successive pair of service nodes. The links 2035 ofGUI 2000 provide a concurrent representation of multiple service pathsin the service chain. The links 2035 are shown in FIGS. 20-24 asdirectly connecting service nodes. However, one of ordinary skill in theart will understand that the links 2035 identify a sequence of serviceoperations of the service chain rather than representing directcommunication between service nodes. In some network systems, servicenodes of a service path may communicate directly. However, in othernetwork systems the processed data from one service node may be sent toone or more intermediary systems before those systems send the processeddata to a service node implementing the next service in the servicepath. The links in the illustrated embodiments are shown between servicenodes. However, in other embodiments, there may be similar links orother indicators after the service nodes on the end nodes of a servicechain to indicate that data messages are sent to other entities (e.g.,to be sent to other virtual machines on the network, machines externalto the network, etc.) after the last service node in the service chain.Examples of links shown after end nodes of a service chain areillustrated in FIGS. 22A-22B.

Each service path of the service chain passes through one service nodeof each service. The GUI 2000 provides path selectors 2040A-2040C asuser selectable objects that each allow a user to select a particularpath to view. For reasons of clarity and space, the figures hereininclude path selectors for only the service nodes of the first service(i.e., service nodes 2010A-2010C). However, one of ordinary skill in theart will understand that other embodiments may include path selectorsfor other nodes. The effects of user interaction with the path selectors2040A-2040C are further described with respect to FIGS. 22A and 22B,below.

In addition to displaying the service nodes and links of a servicechain, the GUI 2000 also displays operational data for variouscomponents of the service chain in response to user commands. FIG. 21illustrates the GUI 2000 displaying operational data for a service nodein response to a user selection of the node. FIG. 21 shows service noderepresentation 2020A, data display area 2110, and cursor 2120. Servicenode 2020A is one of two nodes that implement the second service in theservice chain. Data display area 2110 is a pop-up display that showsoperational data for service node 2020A. Cursor 2120 represents a cursorof a control device used to interact with the GUI 2000.

The GUI 2000 displays a data display area with operational data relatingto a particular node when a user selects that node. As shown in thisfigure, the cursor 2120 hovers (e.g., is placed over) over the servicenode 2020A to select it. In response to the selection, the GUI 2000provides the data display area 2110 with operational data relating toservice node 2020A. The operational data in the display area 2110 mayinclude a latency measurement of the node, throughput of the node, errormeasurements, or other data about the node.

In addition to latency and throughput, the display area of someembodiments may provide various types of additional counters, such as(1) initial counters, (2) error counters, (3) latency and byte counters,and/or (4) liveness check counters. Initial counters provide the usertraffic and liveness packet counts to/from a serviceVM vNIC. Examples ofinitial counters in some embodiments include rx_from_svm,rx_from_iochain, to_svm_liveness_pkts, and from_svm_liveness_pkts. Errorcounters show measurements of different types of errors that can causepacket drop by the serviceVM proxy (e.g., the proxy sitting on theservice VM vNIC). Examples of error counters in some embodiment includenative_length_patch_fail, bad_eth_hdr, quench, get_nsh_fail,ingress_inject_fail, copy_fail, nsh_ttl_expired, zero_si,unknown_next_hop, set_nsh_fail, push_headroom_fail,insufficient_decap_bytes, insufficient_spf_bytes, andinsufficient_map_bytes, partial_copy. Latency and bytes counters can beutilized to show latency cause and throughput per hop in the servicechain. In some embodiments, the latency and byte counters can also beutilized for the whole chain (i.e., to evaluate the performance of onechain with the performance of another chain). The previously mentionedlatency (shown in ms) and throughput counters are examples of latencyand byte counters. Other examples of latency and byte counters in someembodiments include service_latency_seconds, rx_bytes_from_svm, andrx_bytes_from_iochain. The liveness check counters are used fordebugging whether liveness checks are happening as expected (whenliveness checks are enabled). Examples of liveness check counters insome embodiments include liveness_checks_performed, svm_alive,liveness_bitmap, and sequence_number.

In some embodiments, any or all of these counters can be exposed in theGUI and in some embodiments can be extracted from the same CLI. In someembodiments, all of the above counters can help with isolating anyspecific hop in the service chain; for determining packet drops,latency, throughput, and/or overall liveness status. Further descriptionof how the operational data is generated is described with respect toFIGS. 26-28 , below.

Although the GUI of the illustrated embodiment displays operational datain response to a hover operation, one of ordinary skill in the art willunderstand that other user command operations are used in otherembodiments. For example, in some embodiments, a cursor-click with aparticular button, a keyboard hotkey, a click-and-drag operation, ratherthan a hover operation, are used to select a node.

In the preceding figure, the GUI 2000 displays operational data for asingle service node. However, the GUI of some embodiments allows a userto view operational data for all service nodes on a particular servicepath. FIG. 22A illustrates the example GUI 2000 displaying operationaldata for the nodes along a service path in response to a user selectionof the service path. FIG. 22A shows path selectors 2040B, including pathselector 2210, selected path 2215, data display areas 2220 and 2225, andaggregate display area 2228. As previously mentioned with respect to thelinks of FIG. 20 , the illustrated paths of FIGS. 22A and 22B identifythe order in which the services of the nodes they connect are performedand do not indicate direct communication between the linked nodes.

In the GUI 2000 the number of path selectors associated with a servicenode of the first service indicate how many service paths include thatparticular service node. Thus, the selectors indicate four paths thatinclude service node 2010A, two paths that include service node 2010B,and three paths that include service node 2010C. In this figure, theuser has selected path selector 2210, one of the service path selectorsassociated with service node 2010B, with a cursor operation (e.g.,hover, click, etc.). In response to this selection, the GUI 2000 hashighlighted the selected service path 2215. The path 2215 goes throughservice nodes 2010B, 2020A, and 2030B then out of node 2030B. Afterleaving node 2030B, data packets are forwarded on toward theirdestinations. One of ordinary skill in the art will understand thatalthough the illustrated implementation shows path 2215 extending beyondnode 2030B, in other embodiments the display shows the path 2215stopping at 2030B.

In some embodiments, the paths are bi-directional. That is, an initialset of packets are sent along the path, through service nodes 2010B,2020A, and 2030B in that order, while reply packets (packets with sourceand destination reversed) are sent back through the same service path inthe reverse order, through service nodes 2030B, 2020A, and 2010B in thatorder.

The GUI 2000 provides operational data for each service node along theselected path 2215. That is, the GUI 2000 displays operational data for:(1) node 2010B in display area 2220; (2) node 2020A in display area2110; and (3) node 2030B in display area 2225. Thus the user candetermine the operational performance of all nodes along the servicepath by reading the operational statistics for each node. In addition todisplaying operational data for each individual node in the path, theGUI 2000 also displays an aggregate display area 2228. The aggregatedisplay area 2228 provides operational data that is the aggregate of theindividual paths. The display area 2228 identifies which service path iscurrently selected for display (here Service Path 6) and the policy rulethat causes data to be sent along this path (here, Policy Rule 12). Thecounters of the aggregate display area 2228 are individually computedaccording to stored formulas. How an aggregate value is determineddepends on what value is being measured. For example, the overalllatency of a path is the sum of the latencies of the individual nodes.However, the throughput of a path is the minimum throughput of theindividual nodes rather than the sum of the throughputs.

In order to allow problematic nodes to be identified more easily than byreaching all the operational statistics for each node, the GUI of someembodiments provides a visual (e.g., color coded) indicator of whetherthe nodes are operating within preset thresholds. FIG. 22B illustratesan example GUI displaying a visual indicator of a poorly functioningnode. FIG. 22B shows path selector 2230 (one of the set of pathselectors 2040A), selected path 2235, data display areas 2240 and 2245,aggregate display area 2248, and visual indicators 2250-2265.

In this figure, in response to a user selection of path selector 2230,the GUI 2000 has highlighted the selected service path 2235. The path2235 goes through service nodes 2010A, 2020A, and 2030A. The operationaldata displayed by the GUI 2000 includes color coded visual indicators2250, 2255, and 2260 in addition to the numerical indicators of servicenode performance. The specific color of each indicator 2250, 2255, and2260 identifies whether an operational statistic of its correspondingnode has gone beyond a threshold. In the illustrated example, athreshold latency of 30 ms has been set as an upper limit. Accordingly,as nodes 2010A and 2020A each have a latency below 30 ms (as shown indisplay areas 2240 and 2110), the indicators 2250 and 2255 are whitecircles. In contrast, node 2030A has a latency of 34.5 ms (as shown indisplay area 2245). 34.5 ms is above the threshold, therefore theindicator 2260 is a gray circle to draw the user's attention to node2030A.

The thresholds of some embodiments are set by a user. In otherembodiments the threshold may be set by a network virtualization manageror other network system (e.g., based on past performance data,pre-programmed threshold levels, etc.). Some embodiments providemultiple thresholds and multiple colors/patterns to indicate them. Forexample, in some embodiments, a node with latency below a 30 msthreshold will have a green indicator, a node with a latency between 30and 50 ms will have a yellow indicator, and a node with a latency above50 ms will have a red indicator.

Aggregate display area 2248 also includes a color coded visual indicator2265. Here, the indicator is a gray circle. Various embodiments maydisplay a colored visual indicator in the aggregate display area fordifferent reasons. For example, in some embodiments, the color of avisual indicator may indicate that an aggregate statistic exceeds aparticular threshold (e.g., total latency for the path is above 60 ms).Alternatively, or in addition, to identifying an out of bounds aggregatestatistic the visual indicator may be set to the color of the node withthe worst performance (in this figure, node 2030A).

In addition to or instead of path selectors for selecting a servicepath, the GUI of some embodiments provides controls that allow a user toselect an entire set of service nodes that each implement a particularservice operation. FIG. 23 illustrates GUI 2000 displaying operationaldata for all the nodes that provide a particular service in response toa user selection of one of the nodes implementing that service. Thefigure shows cursor 2120, node 2020A, display areas 2110 and 2310, andservice control object 2320.

In FIG. 23 , the GUI 2000 receives a cursor click that selects node2020A. In response, the GUI 2000 displays operational data for node2020A in display area 2110, and operational data for node 2020B (thatprovides the same service) in display area 2310. By displayingoperational data for an entire service, the GUI 2000 allows a user todetermine whether that service is inadequate for the data coming throughthe service chain and whether individual nodes of the service arefunctioning properly. When the service is performing inadequately, theuser may decide to add additional nodes for the service (service 2).Accordingly, the GUI 2000 provides a service control object 2320 forthat service. Further details about the control object 2320 isillustrated in the following figures.

FIG. 24 illustrates GUI 2000 adding an additional service node for theselected service in response to a user command. In this figure, thecursor 2120 has just clicked on service control object 2320. Inresponse, a new service node and at least one new service path has beenadded to the service chain. The GUI 2000 provides a new service noderepresentation 2410 for the newly implemented node. Adding new node 2410also creates new service paths. Previously, with two nodes of service 2and two nodes of service 3, each node of service 1 was the start of four(two options for service 2 times two options for service 3) potentialservice paths of the service chain. In FIG. 24 , with three nodes forservice 2, each node of service 1 is the start of six (three options forservice 2 times two options for service 3) potential service paths ofthe service chain. Accordingly, the GUI has also added two new pathselectors 2430 to each set of path selectors 2040A-2040C to identify newpaths that use the new links 2440 that run through new service node2410.

The GUIs of some embodiments use a different pattern or color frompreviously displayed paths, or use other visual indicators for newlyadded paths. For example, the new links 2440 and the path selectors 2430are shown in FIG. 24 with dotted lines as an indicator of a newly addedlink/node. The GUIs of some other embodiments do not visuallydifferentiate new links/nodes from previously existing links/nodes.

In addition to or instead of providing controls to add a node, someembodiments allow a user to command that an operational node restart orshut down. GUIs of different embodiments receive restart or shutdowncommands in different ways, such as (1) through a particular cursoroperation on the node to be restarted/shut down, (2) selection of ashutdown option from a menu of options, or (3) some combination ofcursor options and menu items. In some such embodiments, a centralcontroller will flag (as inactive) paths through a node that is about tobe restarted or shut down, in order to stop data traffic from using thatnode while it is restarting/shutting down.

The above figures illustrate a GUI displaying a particular servicechain. The GUIs of some embodiments allow a user to select one out ofmultiple service chains of a network. FIG. 25 illustrates a process 2500of some embodiments for selecting a service chain, then receiving anddisplaying operational data for components that implement that servicechain. The process 2500 starts when a server of the network (e.g., aserver implementing a network controller or manager) receives (at 2505)a selection of a service chain. The service chain may be one of multipleservice chains operating on a network or network segment. In someembodiments, the selection can be received via a menu of service chains,a set of graphical representations of service chains or some otherselection method.

After a service chain is selected, the process 2500 then identifies (at2510) data collected for components of that service chain. In someembodiments, the process 2500 identifies this data in a database such asthe one described with respect to FIG. 28 , below. This data may includethroughput values, latency values, other statistical data collected byproxy servers or calculated from measurements by the proxy servers,averages or weighted averages of such statistical data, and/or otheroperational data about service nodes (e.g., what type of packets eachservice node processes, what service is provided by each service node,etc.). In some embodiments, the process 2500 retrieves the identifieddata in this operation. In other embodiments, the process identifieswhich data in the database should be accessed for each component, butdoes not retrieve the operational data for a component until thecomponent is selected by a user.

After identifying the relevant data for the selected service chain, theprocess 2500 then generates (at 2515) a graphical display of the servicechain and its components. In some embodiments, as described with respectto FIGS. 20-24 , above, the graphical display includes representationsof the service nodes implementing a service chain and links between thenodes.

The process 2500 receives (at 2520) a selection of a service chaincomponent (e.g., as shown in FIG. 21 , above). In some embodiments,multiple service chain components can be selected at the same time(e.g., as shown in FIGS. 22A-23 , above). The process then displays (at2525) operational data collected for the selected service chaincomponent(s). In some embodiments, displaying the operational data inoperation 2525 includes periodically updating the displayed data as newdata about the selected component(s) becomes available.

After displaying the operational data for the selected component(s), theprocess 2500 then determines (at 2530) whether there are any moreselections of service chain components. If there are more selections,the process 2500 returns to operation 2525 and displays operational datafor the selected service chain component. If there are no furtherselections the process ends. In some embodiments, when the process 2500ends, the GUI will continue to display the operational data for thepreviously selected components and may also update the displayedoperational data.

Although the above described process 2500 is in a particular order forpurposed of description, one of ordinary skill in the art willunderstand that other embodiments may perform such a process in adifferent order. The FIGS. 20-25 describe GUI displays in terms ofcomponents of a single service chain. However, one of ordinary skill inthe art will understand that a particular service node may implementpart of multiple service chains, rather than being exclusive to oneservice chain.

As previously mentioned, the operational data displayed in the abovedescribed GUIs may be collected by service proxies, each of whichmonitors a corresponding service node. The following three figuresdescribe a service proxy and processes for collating and analyzing thedata from these service proxies.

Each service proxy in some embodiments monitors its respective servicenode to determine whether the service node is functioning and togenerate multiple statistics regarding the operation of its servicenode. FIG. 26 illustrates an example of a service proxy 2610 monitoringits service node 2605. As shown, the service proxy 2610 passes datapackets 2620 to and from the service node 2605. While doing this, theservice node in some embodiments generates statistics regarding thenumber of data packets 2620, the size (e.g., bits, bytes, etc.) of thepayloads of those packets, and the throughput of the service node. Insome embodiments, the service node computes a throughput value bydividing the number of bits/bytes in a packet's payload by the time thatit takes the service node to process the packet (i.e., by the durationof time from when the packet leaves the service proxy to when it returnsto the service proxy from the service node). Some embodiments compute athroughput value of a service node by measuring the total amount of data(in bits or bytes) in all payloads of all data messages are processed bya service node over some regular period (e.g., 1 second, 5 seconds, 10seconds, etc.) then dividing by the duration of that period. Otherembodiments may count the bits/bytes in the payloads of data messagesuntil some threshold number of bits/bytes are processed, while measuringthe time the service node takes to process those data messages, thendivide the number of bits/bytes processed by the measured time. Otherembodiments may measure the number of bits/bytes in a set number ofpackets (e.g., 1 packet, 1024 packets, etc.) processed by the servicenode while measuring the time it takes the service node to process thatnumber of packets, then divide the bits/bytes in those packets by thetime it took the service node to process them.

In some embodiments, the service proxy 2610 stores the generatedstatistics (e.g., number of data packets, number of bytes, andthroughput) in a data storage 2615 of the service proxy 2610. Asmentioned above, the service proxy in some embodiments periodicallyexchanges liveness messages 2625 to perform periodic liveness checks toensure that the service node is operational. From these livenessmessaging exchange, the service proxy generates latency metric valuesthat gauge the responsiveness of the service node, and stores an averagelatency metric value in its data storage 2615. Periodically the serviceproxy 2610 provides the statistics (e.g., the latency values, thethroughput, etc.) that it maintains in its storage 2615 to thecontroller cluster, which then assesses this data to determine whether aservice chain's implementation should be modified.

FIG. 27 conceptually illustrates a process 2700 that the service proxyperforms in some embodiments to calculate and report statistics for itsassociated service node. In some embodiments, the service proxycontinuously performs the process 2700 while the proxy and itscorresponding service node are operational. The process 2700 starts whenthe process generates (at 2705) a statistical measurement regarding theoperation of its service node. In some embodiments, the measurementgenerated at 2705 can be either a latency value or a throughput value ofthe service node. The process 2700 then stores (at 2710) the generatedmeasurement in the data storage 2615 in FIG. 26 when it is the firstmeasurement of its kind (i.e., it is the first latency value or firstthroughput value) for the reporting period.

When this measurement is not the first measurement of its kind for thereporting period, the process averages (at 2710) the generatedmeasurement (i.e., averages the generated latency) with the previousmeasurement or average measurement (i.e., with the previous latency oraverage latency) stored in the data storage 2615. In some embodiments,the service proxy calculates weighted averages of each measurement to bemore heavily weighted to recent throughput or latency measurements(i.e., uses a larger weight value for the measurement value computed at2705 than the measurement value or average measurement value retrievedfrom the storage 2615). After generating an average measurement value at2710, the process (at 2710) stores back in the data storage 2615 theaverage measurement value.

The process 2700 next determines (at 2715) whether it is time to reportthe statistics. If it is not time to report the statistics, the process2700 returns to operations 2705 where it waits until it generatesanother measurement value (e.g., another latency or throughput value)for its service proxy. On the other hand, when the process determines(at 2715) that it is time to report the generated statistics, theprocess 2700 reports (at 2720) the generated measurements (e.g., thegenerated latency and throughput measurements) to the controllercluster. It then returns to operation 2705, where it waits until itgenerates another measurement value (e.g., another latency or throughputvalue) for its service proxy.

After the service proxies of some embodiments collect operationalstatistics, these statistics are sent to a central controller forfurther aggregation, storage in a database, and analysis. The databaseof some embodiments may be used to provide operational data to a user,as previously described. In addition to or instead of using the databasefor providing operational data to a user, the aggregated statistics inthe database may be analyzed in order to automatically adjust a servicechain for better results.

FIG. 28 conceptually illustrates a process 2800 of some embodiments forreceiving, storing, and analyzing, at a central controller, statisticsfrom a service proxy. The process 2800 starts when the centralcontroller receives (at 2805) operational statistics from a serviceproxy. In some embodiments, these operational statistics are forwardedperiodically by the service proxy to the controller cluster. In otherembodiments, the operational statistics may be received after thecontroller commands the service proxy to send the statistics.

After receiving the statistics from the service proxy, the process 2800stores (at 2810) them in a database. If any previous statistics fromthat proxy are stored in the database, the process generates blendedaverages of the statistics and stores the blended averages in thedatabase. By storing blended averages, the central controller furthersmooths the data received from the service proxy. After storing thestatistics, the process 2800 determines (at 2815) whether it is time toanalyze them. If it is not time to analyze the stored statistics, thenthe process 2800 returns to operation 2805 to receive more operationalstatistics. If it is time to analyze the stored statistics, then theprocess 2800 calls (at 2820) one or more analysis engines to analyzecollected statistics. The central controller will modify service pathsor nodes if the analysis indicates that changes to the service paths areneeded. One of ordinary skill in the art will understand that althoughone iteration through operations 2805-2815 receives and blendsstatistics from a single node, that the analysis engines called inoperation 2820 may analyze statistics from multiple nodes (e.g., allnodes in a service chain). The network of some embodiments uses a singleserver (e.g., a server implementing the central controller) to bothcollect and analyze the data relating to a particular service chain. Inother embodiments, the analysis engines run on a separate server orservers from the central controller. Examples of analysis of statisticsof nodes and modification of service paths and service nodes aredescribed in the following three figures.

One problem encountered with service chains is a failure of a singleservice node. When this failure results in a service node that isinoperative, too slow, generating too many errors, or otherwise fallingbelow a performance threshold, the system of some embodiments alters theservice paths of the service chain by restarting or remediating thefailing service node.

FIG. 29 illustrates a process for automatically restarting/remediatingan operational service node when the particular service node is fails tomeet a threshold. Not operating properly could include, for example,producing excessive errors, operating too slowly, pausing unexpectedly,etc.

The process 2900 analyzes (at 2905) data transmission characteristics ofoperational service nodes of a service. The data transmissioncharacteristics received from service proxies for the nodes. Althoughthe process 2900 analyzes nodes of a particular service, in someembodiments a similar process analyzes all nodes in a service chain. Asmentioned with respect to FIG. 28 , above, the analysis may be by ananalysis engine called by a central controller.

The process 2900 selects (at 2910) a particular node. The processes 2900of some embodiments may have some pre-filtering to select the node to beexamined. For example, in some embodiments, an analysis engine mayselect a node if the service proxy of that node has flagged it asfailing. In other embodiments, the process 2900 selects each node of aservice until all the nodes have been selected.

If the performance of the node meets (at 2915) a threshold, then theprocess 2900 proceeds to operation 2930. If the performance of theservice node fails to meet (at 2915) the threshold, then the process2900 proceeds to operation 2920. The process 2900 then flags (at 2920)the service paths using the failing node as inactive. The service pathsare flagged as inactive in order to stop data traffic through the nodebefore restarting or otherwise remediating whatever problem is causingthe node to fail. The networks of some embodiments use a centralizedservice to generate a service path index. The networks of otherembodiments generate service paths dynamically in a decentralizedmanner. In embodiments that use a single service path index, the process2900 flags service paths as inactive by identifying which specific pathsin the service path index pass through the failing node and designatingthose paths as inactive. In embodiments where the service paths aregenerated dynamically in a decentralized manner, the process 2900 flagsservice paths as inactive by notifying the service path generators notto use the failing node as part of a service path.

The process then restarts the node or performs other actions thatremediate the problem with the node (at 2925). If (at 2930) the processhas not analyzed all the nodes, then the process returns to operation2910 and selects another node. If (at 2930) the process has analyzed allthe nodes, then the process 2900 ends.

Even when all nodes of a service operation are functioning properly, theload on that service may be more than the existing nodes of the servicecan handle. Accordingly, in some embodiments, analysis is directedtoward a service operation as a whole, rather than at individual nodesof that service (e.g., after the individual nodes are found to beworking properly). In such cases, the method of some embodiments addsnew service nodes and new service paths to handle the load.Instantiating the new service node in some embodiments is in response tooperational data indicating that a throughput or latency of one or moreservice nodes implementing the particular service is beyond a threshold(e.g., lower than a set minimum threshold or above a set maximumthreshold).

FIG. 30 illustrates a process 3000 for automatically adding a newservice node when the collective performance of service nodesimplementing a service path fail to meet a threshold. The process 3000is used in some embodiments to identify and remediate cases where thenodes that perform a particular service are collectively overwhelmed bythe amount of data to be processed by that service.

The process 3000 analyzes (at 3005) data transmission characteristics ofoperational service nodes of a service. The data transmissioncharacteristics are received from service proxies for the nodes. If theperformance of the service nodes performing that service meets (at 3010)a threshold, then the process 3000 ends. If the performance of theservice nodes of a service fails to meet (at 3010) the threshold, thenthe process 3000 proceeds to operation 3015.

The process 3000 then instantiates (at 3015) a new node for the service.The process 3000 also defines (at 3020) new service paths through thenew operational node. In some embodiments, defining new service pathsfor a new operational node includes eliminating one or more existingservice paths through already operational service nodes. In otherembodiments, existing service paths are not eliminated, but the amountof data messages through the paths is reduced. The central controllersof some embodiments may reduce the amount of data messages throughexisting paths and increase the amount of data messages through newpaths by adjusting a set of weighting values in selection metrics thatare then distributed to host machines of the network that use thoseselection metrics to determine when to add a new data flow to a servicepath.

Another issue for a network implementing service chains is that the flowof data traffic for a particular service chain will not be constant.When more data traffic is sent through a service chain, the network mayadd additional service nodes to handle the increased traffic. However,when the data traffic drops for a particular chain, there may be moreservice nodes for a particular service node than are needed to processthe data traffic at that time. Accordingly, the method of someembodiments may deactivate an existing service node that is implementinga particular service and eliminate the paths through that node.Deactivating the existing service node in some embodiments may be inresponse to operational data indicating that a throughput or latency ofone or more service nodes implementing the particular service is beyonda threshold.

In some cases, a particular node may be underused, not because there istoo little data traffic for the service, but because other nodes of theservice are being overused. Therefore, in some embodiments when a nodeis found to be underused, the analysis engines determine whether othernodes of that service are overused (indicating rebalancing of dataallocation is needed). If none of the nodes are overused, then theunderused node is shut down.

FIG. 31 illustrates a process for automatically shifting data traffic tounderused nodes of a service or recovering resources from an underusednode of a service. Some embodiments shift data traffic to an underusednode by (1) eliminating or reducing the use of paths through other,overused, nodes of the service and (2) instantiating new paths that passthrough the underused node.

The process 3100 analyzes (at 3105) data transmission characteristics ofoperational service nodes of a service. The data transmissioncharacteristics are received from service proxies for the nodes. Theprocess 3100 determines (at 3110) whether any node is underused. Anexample of an indicator of an underused node would be a low throughputresulting from a low amount of data messages received rather than from adelay in processing the data messages that are received.

If no node is underused, the process 3100 ends. If a node is underused,the process 3100 determines (at 3115) whether any other node(s) is/areoverused. In some embodiments, a node may be overused because too manyservice paths are using that node. In some embodiments, in addition toor instead of too many service paths using a node, the service pathsusing that node may be handling too much data.

If another node is overused, the process eliminates (at 3120) or reducesthe amount of data message traffic sent on, one or more paths throughthe overused node. Some or all of the data message traffic through thosepath is then routed through existing or new path(s) through theunderused node. That is, the process increases the load of the underusednode while decreasing the lode of the overused node. If no other node isoverused, the process shuts down (at 3125) the underused node. In someembodiments, shutting down the underused node is preceded by anoperation to flag the paths through the node as inactive, as describedwith respect to operation 2920 of FIG. 29 , above.

In some embodiments, the data sent along a particular path is sent on aper flow basis. That is, when a new flow (e.g., a flow defined by afive-tuple with source and destination IP addresses, source anddestination port numbers, and packet protocol, or a tuple withfewer/more attributes) is received by the service chain, the flow isassigned to a particular path by a service insertion pre-processor of ahost. The service insertion pre-processor of some embodiments usesselection metrics stored in a path table on the host to determine whichpath to assign a particular flow to. In some embodiments, the selectionmetrics are set by a network controller on a server, such as the centralcontroller described with respect to FIG. 28 . The central controller ofsome embodiments, after having the collected statistics analyzed,changes the weights of the selection criteria in the selection metric infavor of paths through underutilized nodes and against paths throughoverused nodes. The network controller then distributes the updatedselection metrics (e.g., through a rule distributor 1814 of FIG. 18 )path tables of the hosts.

Many of the above-described features and applications are implemented assoftware processes that are specified as a set of instructions recordedon a computer readable storage medium (also referred to as computerreadable medium). When these instructions are executed by one or moreprocessing unit(s) (e.g., one or more processors, cores of processors,or other processing units), they cause the processing unit(s) to performthe actions indicated in the instructions. Examples of computer readablemedia include, but are not limited to, CD-ROMs, flash drives, RAM chips,hard drives, EPROMs, etc. The computer readable media does not includecarrier waves and electronic signals passing wirelessly or over wiredconnections.

In this specification, the term “software” is meant to include firmwareresiding in read-only memory or applications stored in magnetic storage,which can be read into memory for processing by a processor. Also, insome embodiments, multiple software inventions can be implemented assub-parts of a larger program while remaining distinct softwareinventions. In some embodiments, multiple software inventions can alsobe implemented as separate programs. Finally, any combination ofseparate programs that together implement a software invention describedhere is within the scope of the invention. In some embodiments, thesoftware programs, when installed to operate on one or more electronicsystems, define one or more specific machine implementations thatexecute and perform the operations of the software programs.

FIG. 32 conceptually illustrates a computer system 3200 with which someembodiments of the invention are implemented. The computer system 3200can be used to implement any of the above-described hosts, controllers,and managers. As such, it can be used to execute any of the abovedescribed processes. This computer system includes various types ofnon-transitory machine readable media and interfaces for various othertypes of machine readable media. Computer system 3200 includes a bus3205, processing unit(s) 3210, a system memory 3225, a read-only memory3230, a permanent storage device 3235, input devices 3240, and outputdevices 3245.

The bus 3205 collectively represents all system, peripheral, and chipsetbuses that communicatively connect the numerous internal devices of thecomputer system 3200. For instance, the bus 3205 communicativelyconnects the processing unit(s) 3210 with the read-only memory 3230, thesystem memory 3225, and the permanent storage device 3235.

From these various memory units, the processing unit(s) 3210 retrieveinstructions to execute and data to process in order to execute theprocesses of the invention. The processing unit(s) may be a singleprocessor or a multi-core processor in different embodiments. Theread-only-memory (ROM) 3230 stores static data and instructions that areneeded by the processing unit(s) 3210 and other modules of the computersystem. The permanent storage device 3235, on the other hand, is aread-and-write memory device. This device is a non-volatile memory unitthat stores instructions and data even when the computer system 3200 isoff. Some embodiments of the invention use a mass-storage device (suchas a magnetic or optical disk and its corresponding disk drive) as thepermanent storage device 3235.

Other embodiments use a removable storage device (such as a flash drive,etc.) as the permanent storage device. Like the permanent storage device3235, the system memory 3225 is a read-and-write memory device. However,unlike storage device 3235, the system memory is a volatileread-and-write memory, such a random access memory. The system memorystores some of the instructions and data that the processor needs atruntime. In some embodiments, the invention's processes are stored inthe system memory 3225, the permanent storage device 3235, and/or theread-only memory 3230. From these various memory units, the processingunit(s) 3210 retrieve instructions to execute and data to process inorder to execute the processes of some embodiments.

The bus 3205 also connects to the input and output devices 3240 and3245. The input devices enable the user to communicate information andselect commands to the computer system. The input devices 3240 includealphanumeric keyboards and pointing devices (also called “cursor controldevices”). The output devices 3245 display images generated by thecomputer system. The output devices include printers and displaydevices, such as cathode ray tubes (CRT) or liquid crystal displays(LCD). Some embodiments include devices such as a touchscreen thatfunction as both input and output devices.

Finally, as shown in FIG. 32 , bus 3205 also couples computer system3200 to a network 3265 through a network adapter (not shown). In thismanner, the computer can be a part of a network of computers (such as alocal area network (“LAN”), a wide area network (“WAN”), or an Intranet,or a network of networks, such as the Internet. Any or all components ofcomputer system 3200 may be used in conjunction with the invention.

Some embodiments include electronic components, such as microprocessors,storage and memory that store computer program instructions in amachine-readable or computer-readable medium (alternatively referred toas computer-readable storage media, machine-readable media, ormachine-readable storage media). Some examples of such computer-readablemedia include RAM, ROM, read-only compact discs (CD-ROM), recordablecompact discs (CD-R), rewritable compact discs (CD-RW), read-onlydigital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a varietyof recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.),flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.),magnetic and/or solid state hard drives, read-only and recordableBlu-Ray® discs, ultra-density optical discs, and any other optical ormagnetic media. The computer-readable media may store a computer programthat is executable by at least one processing unit and includes sets ofinstructions for performing various operations. Examples of computerprograms or computer code include machine code, such as is produced by acompiler, and files including higher-level code that are executed by acomputer, an electronic component, or a microprocessor using aninterpreter.

While the above discussion primarily refers to microprocessor ormulti-core processors that execute software, some embodiments areperformed by one or more integrated circuits, such as applicationspecific integrated circuits (ASICs) or field programmable gate arrays(FPGAs). In some embodiments, such integrated circuits executeinstructions that are stored on the circuit itself.

As used in this specification, the terms “computer”, “server”,“processor”, and “memory” all refer to electronic or other technologicaldevices. These terms exclude people or groups of people. For thepurposes of the specification, the terms display or displaying meansdisplaying on an electronic device. As used in this specification, theterms “computer readable medium,” “computer readable media,” and“machine readable medium” are entirely restricted to tangible, physicalobjects that store information in a form that is readable by a computer.These terms exclude any wireless signals, wired download signals, andany other ephemeral or transitory signals.

While the invention has been described with reference to numerousspecific details, one of ordinary skill in the art will recognize thatthe invention can be embodied in other specific forms without departingfrom the spirit of the invention. For instance, several figuresconceptually illustrate processes. The specific operations of theseprocesses may not be performed in the exact order shown and described.The specific operations may not be performed in one continuous series ofoperations, and different specific operations may be performed indifferent embodiments. Furthermore, the process could be implementedusing several sub-processes, or as part of a larger macro process.

Even though the service insertion rules in several of theabove-described examples provide service chain identifiers, some of theinventions described herein can be implemented by having a serviceinsertion rule provide the service identifiers (e.g., SPIs) of thedifferent services specified by the service insertion rule. Similarly,several of the above-described embodiments perform distributed servicerouting that relies at each service hop identifying a next service hopby performing an exact match based on the SPI/SI values. However, someof the inventions described herein can be implemented by having theservice insertion pre-processor embed all the service hop identifiers(e.g., service hop MAC addresses) as the data message's serviceattribute set and/or in the data message's encapsulating service header.

In addition, some embodiments decrement the SI value differently (e.g.,at different times) than the approaches described above. Also, insteadof performing the next hop lookup just based on the SPI and SI values,some embodiments perform this lookup based on the SPI, SI and servicedirection values as these embodiments use a common SPI value for boththe forward and reverse directions of data messages flowing between twomachines.

The above-described methodology is used in some embodiments to expresspath information in single tenant environments. Thus, one of ordinaryskill will realize that some embodiments of the invention are equallyapplicable to single tenant datacenters. Conversely, in someembodiments, the above-described methodology is used to carry pathinformation across different datacenters of different datacenterproviders when one entity (e.g., one corporation) is a tenant inmultiple different datacenters of different providers. In theseembodiments, the tenant identifiers that are embedded in the tunnelheaders have to be unique across the datacenters, or have to betranslated when they traverse from one datacenter to the next. Thus, oneof ordinary skill in the art would understand that the invention is notto be limited by the foregoing illustrative details, but rather is to bedefined by the appended claims.

1-30. (canceled)
 31. A method for monitoring and adjusting a servicechain comprising a plurality of services to perform on data messagespassing through a network, the method comprising: for a service chainimplemented by a set of service paths each of which comprises aplurality of service nodes that implement the plurality of services ofthe service chain, each service node connecting to the network through aservice proxy that forwards data messages between the network and theservice node: receiving, from a plurality of service proxies,operational data relating to data transmission characteristics of a setof operational service nodes in the plurality of service nodes;analyzing the data transmission characteristics; in response to theanalysis of the data transmission characteristics, altering the set ofservice paths implementing the service chain.
 32. The method of claim31, wherein altering the set service paths comprises adding a newservice path by instantiating a new service node to implement aparticular service of the service chain for the new service path. 33.The method of claim 32, wherein instantiating the new service node is inresponse either (i) to operational data indicating that a throughput ofone or more service nodes implementing the particular service is beyonda threshold throughput, or (ii) to operational data indicating that alatency of one or more service nodes implementing the particular serviceis beyond a threshold latency.
 34. The method of claim 31, whereinaltering the service paths comprises eliminating a service path bydeactivating an existing service node that is implementing a particularservice for the service path.
 35. The method of claim 34, whereindeactivating the existing service node is in response either (i) tooperational data indicating that a throughput of one or more servicenodes implementing the particular service is beyond a thresholdthroughput, or (ii) to operational data indicating that a latency of oneor more service nodes implementing the particular service is beyond athreshold latency.
 36. The method of claim 31, wherein altering theservice paths comprises restarting an existing service node that isimplementing a particular service.
 37. The method of claim 36, whereinthe restarting is in response either (i) to operational data indicatingerrors in the existing service node, or (ii) to operational dataindicating that the existing service node has fallen below a performancethreshold.
 38. The method of claim 31, wherein a particular service nodeimplements parts of multiple service paths of the service chain.
 39. Themethod of claim 38, wherein altering the service paths comprisesreducing a number of service paths implemented in part by the particularservice node.
 40. The method of claim 31, wherein altering the servicepaths comprises increasing a number of service paths implemented in partby a particular service node.
 41. The method of claim 31, wherein aparticular service node and a particular service proxy for theparticular service node execute on one host computer.
 42. The method ofclaim 31, wherein the operational data includes operational statisticsderived from the data transmission characteristics received from the setof service proxies.
 43. The method of claim 42, wherein analyzing thedata transmission characteristics comprises aggregating operationalstatistics data with previously collected operational statistics data.44. The method of claim 43, wherein the aggregating comprisescalculating a weighted sum of the operational statistics data.
 45. Themethod of claim 42, wherein the operational statistics comprises datatransmission characteristics of the service nodes received from theservice proxies.
 46. The method of claim 45, wherein the datatransmission characteristics of a particular service node comprise atleast one of a throughput of the particular service node, and a latencytime for the particular service node.
 47. The method of claim 31,wherein the service proxies collects data relating to packets receivedby their associated service nodes.
 48. A non-transitory machine-readablemedium storing a program which when executed by at least one processingunit monitors and adjusts a service chain comprising a plurality ofservices to perform on data messages passing through a network, theprogram comprising sets of instructions for: for a service chainimplemented by a set of service paths each of which comprises aplurality of service nodes that implement the plurality of services ofthe service chain, each service node connecting to the network through aservice proxy that forwards data messages between the network and theservice node: receiving, from a plurality of service proxies,operational data relating to data transmission characteristics of a setof operational service nodes in the plurality of service nodes;analyzing the data transmission characteristics; in response to theanalysis of the data transmission characteristics, altering the set ofservice paths implementing the service chain.
 49. The non-transitorymachine-readable medium of claim 48, wherein the set of instructions foraltering the set service paths comprises a set of instructions foradding a new service path by instantiating a new service node toimplement a particular service of the service chain for the new servicepath.
 50. The non-transitory machine-readable medium of claim 49,wherein instantiating the new service node is in response either (i) tooperational data indicating that a throughput of one or more servicenodes implementing the particular service is beyond a thresholdthroughput, or (ii) to operational data indicating that a latency of oneor more service nodes implementing the particular service is beyond athreshold latency.